Response Structure Details
This document outlines the structure of the response returned by the Inspeq AI SDK when evaluating LLM tasks.
Overall Response Structure
The SDK returns a JSON object with the following top-level keys:
status
: HTTP status code of the response (e.g., 200 for success)message
: A descriptive message about the evaluation processresults
: An array of evaluation results for each metricuser_id
: The ID of the user who owns the projectremaining_credits
: The number of credits remaining for the user
Evaluation Result Structure
Each item in the results
array represents the evaluation of a single metric and contains the following fields:
Mostly our SDK clients would be interested in these three values:
metric_name
: Name of the metric being evaluated (e.g., "DIVERSITY_EVALUATION")score
: Numeric score for the metric evaluationpassed
: Indicating whether the evaluation passed the threshold
Complete list of output fields is as below:
id
: Unique identifier for this evaluation resultproject_id
: ID of the project associated with this evaluationtask_id
: ID of the specific task being evaluatedtask_name
: Name of the task (e.g., "capital_question")model_name
: Name of the model being evaluated (if applicable)source_platform
: Platform source of the evaluation (e.g., "SDK")data_input_id
: ID of the input data used for evaluationdata_input_name
: Name of the input data setmetric_set_input_id
: ID of the metric set used for evaluationmetric_set_input_name
: Name of the metric setprompt
: The prompt given to the LLMresponse
: The response generated by the LLMcontext
: Additional context provided for the evaluation (if any)metric_name
: Name of the metric being evaluated (e.g., "DIVERSITY_EVALUATION")score
: Numeric score for the metric evaluationpassed
: Boolean indicating whether the evaluation passed the thresholdevaluation_details
: Detailed results of the evaluation (explained below)metrics_config
: Configuration used for the metric evaluationcreated_at
: Timestamp of when the evaluation was createdupdated_at
: Timestamp of the last update to the evaluationcreated_by
: Entity that created the evaluation (e.g., "SYSTEM")updated_by
: Entity that last updated the evaluationis_deleted
: Boolean indicating if the evaluation has been deletedmetric_evaluation_status
: Overall status of the metric evaluation request (e.g., "PASS", "FAIL", "EVAL_FAIL"). This will be "EVAL_FAIL" if inspeq is not able to evalaute the metric due to internal reasons
Evaluation Details
The evaluation_details
object contains the following fields:
actual_value
: The raw value calculated for the metricactual_value_type
: Data type of the actual value (e.g., "FLOAT")metric_labels
: Array of labels assigned based on the evaluation resultmetric_name
: Name of the metric (same as in the parent object)others
: Additional metadata (if any)threshold
: Array indicating whether the evaluation passed the thresholdthreshold_score
: The threshold score used for evaluation
Metrics Configuration
The metrics_config
object contains:
custom_labels
: Array of custom labels used for categorizing resultslabel_thresholds
: Array of threshold values for assigning labelsthreshold
: The main threshold value for pass/fail determination
Example Usage
To access the evaluation results for a specific metric, you can iterate through the results
array:
This structure allows for comprehensive analysis of each metric evaluation, providing both high-level results and detailed information for in-depth assessment of LLM performance.
Last updated