Response Structure Details

This document outlines the structure of the response returned by the Inspeq AI SDK when evaluating LLM tasks.

Overall Response Structure

The SDK returns a JSON object with the following top-level keys:

status: HTTP status code of the response (e.g., 200 for success)
message: A descriptive message about the evaluation process
results: An array of evaluation results for each metric
user_id: The ID of the user who owns the project
remaining_credits: The number of credits remaining for the user

Evaluation Result Structure

Each item in the results array represents the evaluation of a single metric and contains the following fields:

Mostly our SDK clients would be interested in these three values:

metric_name: Name of the metric being evaluated (e.g., "DIVERSITY_EVALUATION")
score: Numeric score for the metric evaluation
passed: Indicating whether the evaluation passed the threshold

Complete list of output fields is as below:

id: Unique identifier for this evaluation result
project_id: ID of the project associated with this evaluation
task_id: ID of the specific task being evaluated
task_name: Name of the task (e.g., "capital_question")
model_name: Name of the model being evaluated (if applicable)
source_platform: Platform source of the evaluation (e.g., "SDK")
data_input_id: ID of the input data used for evaluation
data_input_name: Name of the input data set
metric_set_input_id: ID of the metric set used for evaluation
metric_set_input_name: Name of the metric set
prompt: The prompt given to the LLM
response: The response generated by the LLM
context: Additional context provided for the evaluation (if any)
metric_name: Name of the metric being evaluated (e.g., "DIVERSITY_EVALUATION")
score: Numeric score for the metric evaluation
passed: Boolean indicating whether the evaluation passed the threshold
evaluation_details: Detailed results of the evaluation (explained below)
metrics_config: Configuration used for the metric evaluation
created_at: Timestamp of when the evaluation was created
updated_at: Timestamp of the last update to the evaluation
created_by: Entity that created the evaluation (e.g., "SYSTEM")
updated_by: Entity that last updated the evaluation
is_deleted: Boolean indicating if the evaluation has been deleted
metric_evaluation_status: Overall status of the metric evaluation request (e.g., "PASS", "FAIL", "EVAL_FAIL"). This will be "EVAL_FAIL" if inspeq is not able to evalaute the metric due to internal reasons

Evaluation Details

The evaluation_details object contains the following fields:

actual_value: The raw value calculated for the metric
actual_value_type: Data type of the actual value (e.g., "FLOAT")
metric_labels: Array of labels assigned based on the evaluation result
metric_name: Name of the metric (same as in the parent object)
others: Additional metadata (if any)
threshold: Array indicating whether the evaluation passed the threshold
threshold_score: The threshold score used for evaluation

Metrics Configuration

The metrics_config object contains:

custom_labels: Array of custom labels used for categorizing results
label_thresholds: Array of threshold values for assigning labels
threshold: The main threshold value for pass/fail determination

Example Usage

To access the evaluation results for a specific metric, you can iterate through the results array:

response = inspeq_eval.evaluate_llm_task(
    metrics_list=["DIVERSITY_EVALUATION"],
    input_data=input_data,
    task_name="example_task"
)

for result in response['results']:
    print(f"Metric: {result['metric_name']}")
    print(f"Score: {result['score']}")
    print(f"Passed: {result['passed']}")
    print(f"Evaluation Details: {result['evaluation_details']}")

This structure allows for comprehensive analysis of each metric evaluation, providing both high-level results and detailed information for in-depth assessment of LLM performance.

PreviousBenchmarking Results NextDefault Config

Last updated 11 months ago