Examples Overview

Learn by example with comprehensive tutorials covering common Judge LLM use cases. Each example demonstrates specific features and best practices.

Example	Focus	Difficulty	Key Concepts
Gemini Agent	Basic setup	Beginner	Configuration, providers, evaluators
Default Config	Defaults	Beginner	Default config, config merging
Custom Evaluator	Custom evaluator	Intermediate	Custom components, registration
Safety Evaluation	Multi-turn	Advanced	Long conversations, safety, trajectory
Config Override	Config override	Intermediate	Per-test config, thresholds
Database Tracking	Database	Intermediate	SQLite, querying, trends

By Category

Getting Started

Perfect for beginners learning Judge LLM basics:

Gemini Agent - Start here! Basic Gemini agent evaluation
Default Config - Reusable defaults and configuration merging

Custom Components

Learn to extend Judge LLM with custom implementations:

Custom Evaluator - Build domain-specific evaluators
Safety Evaluation - Multi-turn safety checks

Advanced Configuration

Master configuration patterns and overrides:

Default Config - Default configuration system
Config Override - Per-test configuration overrides

Data & Reporting

Store and analyze evaluation results:

Database Tracking - SQLite storage and querying

Running Examples

Prerequisites

All examples require:

Judge LLM installed:

pip install judge-llm

API keys configured:

# Create .env file
echo "GEMINI_API_KEY=your_key" > .env
echo "OPENAI_API_KEY=your_key" >> .env

Navigate to example:

cd examples/01-gemini-agent

Running an Example

Each example can be run with:

judge-llm run --config config.yaml

Or using the Python API:

python run_evaluation.py

Expected Output

Typical output looks like:

Starting evaluation...

Evaluation Progress:
  test_001: ✓ PASSED (cost: $0.0012, time: 1.2s)
  test_002: ✓ PASSED (cost: $0.0015, time: 1.5s)
  test_003: ✗ FAILED (cost: $0.0010, time: 0.8s)

Summary:
  Total Tests: 3
  Passed: 2
  Failed: 1
  Success Rate: 66.7%
  Total Cost: $0.0037
  Total Time: 3.5s

Example Structure

Each example includes:

XX-example-name/
├── README.md              # Detailed explanation
├── config.yaml            # Configuration file
├── sample.evalset.json    # Test cases
├── run.sh                 # Shell script runner
└── run_evaluation.py      # Python API runner

Common Patterns

Basic Configuration

dataset:
  loader: local_file
  paths: [./sample.evalset.json]

providers:
  - type: gemini
    agent_id: my_agent
    model: gemini-2.0-flash-exp

evaluators:
  - type: response_evaluator

reporters:
  - type: console

Using Defaults

# .judge_llm.defaults.yaml
providers:
  - type: gemini
    model: gemini-2.0-flash-exp
    temperature: 0.7

evaluators:
  - type: response_evaluator
  - type: cost_evaluator

# test.yaml (overrides defaults)
dataset:
  loader: local_file
  paths: [./tests.json]

providers:
  - type: gemini
    agent_id: my_agent
    # Inherits model and temperature from defaults

Custom Components

evaluators:
  - type: custom
    module_path: ./evaluators/safety.py
    class_name: SafetyEvaluator
    config:
      strict_mode: true

Troubleshooting

API Key Not Found

Error: API key not found for provider: gemini

Solution:

# Set environment variable
export GEMINI_API_KEY=your_key

# Or create .env file
echo "GEMINI_API_KEY=your_key" > .env

Module Not Found

Error: Module not found: ./evaluators/safety.py

Solution: Ensure you're in the example directory:

pwd
# Should be: /path/to/examples/XX-example-name
cd examples/XX-example-name

Test File Not Found

Error: Test file not found: ./tests.json

Solution: Check file exists and path is correct:

ls sample.evalset.json
# Or check config for correct path
cat config.yaml | grep paths

Permission Denied

Error: Permission denied: ./run.sh

Solution: Make script executable:

chmod +x run.sh
./run.sh

Modifying Examples

To experiment with an example:

Copy the example:

cp -r examples/01-gemini-agent my-experiment
cd my-experiment

Modify configuration or tests:

# Edit test cases
vim sample.evalset.json

# Edit configuration
vim config.yaml

Run with changes:

judge-llm run --config config.yaml

Compare results:

diff my-experiment/results.json examples/01-gemini-agent/results.json

Learning Path

Beginner

Start with Gemini Agent to understand basic setup
Learn Default Config for reusable configurations
Try Database Tracking for storing results

Intermediate

Build Custom Evaluator for domain-specific checks
Master Config Override for flexible testing
Explore Safety Evaluation for multi-turn scenarios

Advanced

Combine multiple examples into a comprehensive test suite
Create custom providers and reporters
Build CI/CD pipelines with Judge LLM

Next Steps

After completing examples:

Read User Guides for comprehensive documentation
Explore Evaluators for evaluation options
Learn about Reporters for output formats
Check Python API for programmatic usage

Contributing Examples

Want to contribute an example? Follow these steps:

Create directory:

mkdir examples/XX-your-example

Add required files:

README.md - Description and instructions
config.yaml - Configuration
sample.evalset.json - Test cases
run.sh - Shell runner
run_evaluation.py - Python runner

Document thoroughly:

Explain what the example demonstrates
List key concepts and learning objectives
Provide step-by-step instructions
Include expected output and troubleshooting

Test completely:

cd examples/XX-your-example
judge-llm run --config config.yaml
python run_evaluation.py

Submit pull request: Include the example in documentation updates

Quick Navigation​

By Category​

Getting Started​

Custom Components​

Advanced Configuration​

Data & Reporting​

Running Examples​

Prerequisites​

Running an Example​

Expected Output​

Example Structure​

Common Patterns​

Basic Configuration​

Using Defaults​

Custom Components​

Troubleshooting​

API Key Not Found​

Module Not Found​

Test File Not Found​

Permission Denied​

Modifying Examples​

Learning Path​

Beginner​

Intermediate​

Advanced​

Next Steps​

Contributing Examples​

Related Documentation​

Quick Navigation

By Category

Getting Started

Custom Components

Advanced Configuration

Data & Reporting

Running Examples

Prerequisites

Running an Example

Expected Output

Example Structure

Common Patterns

Basic Configuration

Using Defaults

Custom Components

Troubleshooting

API Key Not Found

Module Not Found

Test File Not Found

Permission Denied

Modifying Examples

Learning Path

Beginner

Intermediate

Advanced

Next Steps

Contributing Examples

Related Documentation