Skip to main content

Evalset Format

Complete guide to the dataset format used by Judge LLM for evaluations.

Overview

Judge LLM supports both JSON and YAML formats for defining evaluation datasets. Each file contains an evaluation set with test cases, conversation history, and configuration.

Supported Formats

  • JSON (.json) - Traditional structured format
  • YAML (.yaml, .yml) - Human-readable alternative

Both formats use the same data structure - choose whichever format you prefer for your workflow.

Basic Structure

An evaluation dataset consists of:

  • EvalSet: Container with metadata and test cases
  • EvalCases: Individual test scenarios
  • Invocations: Conversation turns with user/model exchanges
  • SessionInput: Configuration for each test case

JSON Format

{
"eval_set_id": "my_test_set_v1",
"name": "My Test Set",
"description": "Description of this evaluation set",
"creation_timestamp": 1704067000.0,
"eval_cases": [
{
"eval_id": "test_001",
"conversation": [],
"session_input": {
"app_name": "math_tutor",
"user_id": "test_user"
},
"creation_timestamp": 1704067200.0
}
]
}

YAML Format

eval_set_id: my_test_set_v1
name: My Test Set
description: Description of this evaluation set
creation_timestamp: 1704067000.0
eval_cases:
- eval_id: test_001
conversation: []
session_input:
app_name: math_tutor
user_id: test_user
creation_timestamp: 1704067200.0

Loading Datasets

Single File

JSON:

dataset:
loader: local_file
paths:
- ./tests/math.json

YAML:

dataset:
loader: local_file
paths:
- ./tests/math.yaml

Multiple Files (Mixed Formats)

dataset:
loader: local_file
paths:
- ./tests/math.json
- ./tests/science.yaml
- ./tests/history.yml

Directory Loading

Load all JSON files:

dataset:
loader: directory
paths: [./tests]
pattern: "*.json"

Load all YAML files:

dataset:
loader: directory
paths: [./tests]
pattern: "*.yaml"

Best Practices

Choose Your Format

Use YAML when:

  • Writing datasets by hand
  • You want readability and comments
  • Working with multi-line text

Use JSON when:

  • Generating datasets programmatically
  • Strict schema validation is needed

Examples

See examples/01-gemini-agent/ for complete examples in both JSON and YAML formats.

Validation

Validate your configuration and dataset:

judge-llm validate --config config.yaml