Evalset Format

Complete guide to the dataset format used by Judge LLM for evaluations.

Overview

Judge LLM supports both JSON and YAML formats for defining evaluation datasets. Each file contains an evaluation set with test cases, conversation history, and configuration.

Supported Formats

JSON (.json) - Traditional structured format
YAML (.yaml, .yml) - Human-readable alternative

Both formats use the same data structure - choose whichever format you prefer for your workflow.

Basic Structure

An evaluation dataset consists of:

EvalSet: Container with metadata and test cases
EvalCases: Individual test scenarios
Invocations: Conversation turns with user/model exchanges
SessionInput: Configuration for each test case

JSON Format

{
  "eval_set_id": "my_test_set_v1",
  "name": "My Test Set",
  "description": "Description of this evaluation set",
  "creation_timestamp": 1704067000.0,
  "eval_cases": [
    {
      "eval_id": "test_001",
      "conversation": [],
      "session_input": {
        "app_name": "math_tutor",
        "user_id": "test_user"
      },
      "creation_timestamp": 1704067200.0
    }
  ]
}

YAML Format

eval_set_id: my_test_set_v1
name: My Test Set
description: Description of this evaluation set
creation_timestamp: 1704067000.0
eval_cases:
  - eval_id: test_001
    conversation: []
    session_input:
      app_name: math_tutor
      user_id: test_user
    creation_timestamp: 1704067200.0

Loading Datasets

Single File

JSON:

dataset:
  loader: local_file
  paths:
    - ./tests/math.json

YAML:

dataset:
  loader: local_file
  paths:
    - ./tests/math.yaml

Multiple Files (Mixed Formats)

dataset:
  loader: local_file
  paths:
    - ./tests/math.json
    - ./tests/science.yaml
    - ./tests/history.yml

Directory Loading

Load all JSON files:

dataset:
  loader: directory
  paths: [./tests]
  pattern: "*.json"

Load all YAML files:

dataset:
  loader: directory
  paths: [./tests]
  pattern: "*.yaml"

Best Practices

Choose Your Format

Use YAML when:

Writing datasets by hand
You want readability and comments
Working with multi-line text

Use JSON when:

Generating datasets programmatically
Strict schema validation is needed

Examples

See examples/01-gemini-agent/ for complete examples in both JSON and YAML formats.

Validation

Validate your configuration and dataset:

judge-llm validate --config config.yaml

Overview​

Supported Formats​

Basic Structure​

JSON Format​

YAML Format​

Loading Datasets​

Single File​

Multiple Files (Mixed Formats)​

Directory Loading​

Best Practices​

Choose Your Format​

Examples​

Validation​

Related Documentation​