Response Structure Details
This document outlines the structure of the response returned by the Inspeq AI SDK when evaluating LLM tasks.
Overall Response Structure
The SDK returns a JSON object with the following top-level keys:
status: HTTP status code of the response (e.g., 200 for success)message: A descriptive message about the evaluation processresults: An array of evaluation results for each metricuser_id: The ID of the user who owns the projectremaining_credits: The number of credits remaining for the user
Evaluation Result Structure
Each item in the results array represents the evaluation of a single metric and contains the following fields:
Mostly our SDK clients would be interested in these three values:
metric_name: Name of the metric being evaluated (e.g., "DIVERSITY_EVALUATION")score: Numeric score for the metric evaluationpassed: Indicating whether the evaluation passed the threshold
Complete list of output fields is as below:
id: Unique identifier for this evaluation resultproject_id: ID of the project associated with this evaluationtask_id: ID of the specific task being evaluatedtask_name: Name of the task (e.g., "capital_question")model_name: Name of the model being evaluated (if applicable)source_platform: Platform source of the evaluation (e.g., "SDK")data_input_id: ID of the input data used for evaluationdata_input_name: Name of the input data setmetric_set_input_id: ID of the metric set used for evaluationmetric_set_input_name: Name of the metric setprompt: The prompt given to the LLMresponse: The response generated by the LLMcontext: Additional context provided for the evaluation (if any)metric_name: Name of the metric being evaluated (e.g., "DIVERSITY_EVALUATION")score: Numeric score for the metric evaluationpassed: Boolean indicating whether the evaluation passed the thresholdevaluation_details: Detailed results of the evaluation (explained below)metrics_config: Configuration used for the metric evaluationcreated_at: Timestamp of when the evaluation was createdupdated_at: Timestamp of the last update to the evaluationcreated_by: Entity that created the evaluation (e.g., "SYSTEM")updated_by: Entity that last updated the evaluationis_deleted: Boolean indicating if the evaluation has been deletedmetric_evaluation_status: Overall status of the metric evaluation request (e.g., "PASS", "FAIL", "EVAL_FAIL"). This will be "EVAL_FAIL" if inspeq is not able to evalaute the metric due to internal reasons
Evaluation Details
The evaluation_details object contains the following fields:
actual_value: The raw value calculated for the metricactual_value_type: Data type of the actual value (e.g., "FLOAT")metric_labels: Array of labels assigned based on the evaluation resultmetric_name: Name of the metric (same as in the parent object)others: Additional metadata (if any)threshold: Array indicating whether the evaluation passed the thresholdthreshold_score: The threshold score used for evaluation
Metrics Configuration
The metrics_config object contains:
custom_labels: Array of custom labels used for categorizing resultslabel_thresholds: Array of threshold values for assigning labelsthreshold: The main threshold value for pass/fail determination
Example Usage
To access the evaluation results for a specific metric, you can iterate through the results array:
response = inspeq_eval.evaluate_llm_task(
metrics_list=["DIVERSITY_EVALUATION"],
input_data=input_data,
task_name="example_task"
)
for result in response['results']:
print(f"Metric: {result['metric_name']}")
print(f"Score: {result['score']}")
print(f"Passed: {result['passed']}")
print(f"Evaluation Details: {result['evaluation_details']}")This structure allows for comprehensive analysis of each metric evaluation, providing both high-level results and detailed information for in-depth assessment of LLM performance.
Last updated