Inspeq AI
  • Introduction
  • Registration Process
  • Quickstart
  • Metrics
  • Benchmarking Results
  • Response Structure Details
  • Default Config
  • Root Cause Analysis (RCA)
  • On-Premise Deployment Overview
  • On-Premise AWS Deployment Guide
  • API Overview
  • Authentication
  • Endpoints
Powered by GitBook
On this page
  • Overall Response Structure
  • Evaluation Result Structure
  • Evaluation Details
  • Metrics Configuration
  • Example Usage

Response Structure Details

This document outlines the structure of the response returned by the Inspeq AI SDK when evaluating LLM tasks.

Overall Response Structure

The SDK returns a JSON object with the following top-level keys:

  • status: HTTP status code of the response (e.g., 200 for success)

  • message: A descriptive message about the evaluation process

  • results: An array of evaluation results for each metric

  • user_id: The ID of the user who owns the project

  • remaining_credits: The number of credits remaining for the user

Evaluation Result Structure

Each item in the results array represents the evaluation of a single metric and contains the following fields:

Mostly our SDK clients would be interested in these three values:

  • metric_name: Name of the metric being evaluated (e.g., "DIVERSITY_EVALUATION")

  • score: Numeric score for the metric evaluation

  • passed: Indicating whether the evaluation passed the threshold

Complete list of output fields is as below:

  • id: Unique identifier for this evaluation result

  • project_id: ID of the project associated with this evaluation

  • task_id: ID of the specific task being evaluated

  • task_name: Name of the task (e.g., "capital_question")

  • model_name: Name of the model being evaluated (if applicable)

  • source_platform: Platform source of the evaluation (e.g., "SDK")

  • data_input_id: ID of the input data used for evaluation

  • data_input_name: Name of the input data set

  • metric_set_input_id: ID of the metric set used for evaluation

  • metric_set_input_name: Name of the metric set

  • prompt: The prompt given to the LLM

  • response: The response generated by the LLM

  • context: Additional context provided for the evaluation (if any)

  • metric_name: Name of the metric being evaluated (e.g., "DIVERSITY_EVALUATION")

  • score: Numeric score for the metric evaluation

  • passed: Boolean indicating whether the evaluation passed the threshold

  • evaluation_details: Detailed results of the evaluation (explained below)

  • metrics_config: Configuration used for the metric evaluation

  • created_at: Timestamp of when the evaluation was created

  • updated_at: Timestamp of the last update to the evaluation

  • created_by: Entity that created the evaluation (e.g., "SYSTEM")

  • updated_by: Entity that last updated the evaluation

  • is_deleted: Boolean indicating if the evaluation has been deleted

  • metric_evaluation_status: Overall status of the metric evaluation request (e.g., "PASS", "FAIL", "EVAL_FAIL"). This will be "EVAL_FAIL" if inspeq is not able to evalaute the metric due to internal reasons

Evaluation Details

The evaluation_details object contains the following fields:

  • actual_value: The raw value calculated for the metric

  • actual_value_type: Data type of the actual value (e.g., "FLOAT")

  • metric_labels: Array of labels assigned based on the evaluation result

  • metric_name: Name of the metric (same as in the parent object)

  • others: Additional metadata (if any)

  • threshold: Array indicating whether the evaluation passed the threshold

  • threshold_score: The threshold score used for evaluation

Metrics Configuration

The metrics_config object contains:

  • custom_labels: Array of custom labels used for categorizing results

  • label_thresholds: Array of threshold values for assigning labels

  • threshold: The main threshold value for pass/fail determination

Example Usage

To access the evaluation results for a specific metric, you can iterate through the results array:

response = inspeq_eval.evaluate_llm_task(
    metrics_list=["DIVERSITY_EVALUATION"],
    input_data=input_data,
    task_name="example_task"
)

for result in response['results']:
    print(f"Metric: {result['metric_name']}")
    print(f"Score: {result['score']}")
    print(f"Passed: {result['passed']}")
    print(f"Evaluation Details: {result['evaluation_details']}")

This structure allows for comprehensive analysis of each metric evaluation, providing both high-level results and detailed information for in-depth assessment of LLM performance.

PreviousBenchmarking ResultsNextDefault Config

Last updated 9 months ago