Skip to main content

Examples Overview

Learn by example with comprehensive tutorials covering common Judge LLM use cases. Each example demonstrates specific features and best practices.

Quick Navigation

ExampleFocusDifficultyKey Concepts
Gemini AgentBasic setupBeginnerConfiguration, providers, evaluators
Default ConfigDefaultsBeginnerDefault config, config merging
Custom EvaluatorCustom evaluatorIntermediateCustom components, registration
Safety EvaluationMulti-turnAdvancedLong conversations, safety, trajectory
Config OverrideConfig overrideIntermediatePer-test config, thresholds
Database TrackingDatabaseIntermediateSQLite, querying, trends

By Category

Getting Started

Perfect for beginners learning Judge LLM basics:

Custom Components

Learn to extend Judge LLM with custom implementations:

Advanced Configuration

Master configuration patterns and overrides:

Data & Reporting

Store and analyze evaluation results:

Running Examples

Prerequisites

All examples require:

  1. Judge LLM installed:
pip install judge-llm
  1. API keys configured:
# Create .env file
echo "GEMINI_API_KEY=your_key" > .env
echo "OPENAI_API_KEY=your_key" >> .env
  1. Navigate to example:
cd examples/01-gemini-agent

Running an Example

Each example can be run with:

judge-llm run --config config.yaml

Or using the Python API:

python run_evaluation.py

Expected Output

Typical output looks like:

Starting evaluation...

Evaluation Progress:
test_001: ✓ PASSED (cost: $0.0012, time: 1.2s)
test_002: ✓ PASSED (cost: $0.0015, time: 1.5s)
test_003: ✗ FAILED (cost: $0.0010, time: 0.8s)

Summary:
Total Tests: 3
Passed: 2
Failed: 1
Success Rate: 66.7%
Total Cost: $0.0037
Total Time: 3.5s

Example Structure

Each example includes:

XX-example-name/
├── README.md # Detailed explanation
├── config.yaml # Configuration file
├── sample.evalset.json # Test cases
├── run.sh # Shell script runner
└── run_evaluation.py # Python API runner

Common Patterns

Basic Configuration

dataset:
loader: local_file
paths: [./sample.evalset.json]

providers:
- type: gemini
agent_id: my_agent
model: gemini-2.0-flash-exp

evaluators:
- type: response_evaluator

reporters:
- type: console

Using Defaults

# .judge_llm.defaults.yaml
providers:
- type: gemini
model: gemini-2.0-flash-exp
temperature: 0.7

evaluators:
- type: response_evaluator
- type: cost_evaluator

# test.yaml (overrides defaults)
dataset:
loader: local_file
paths: [./tests.json]

providers:
- type: gemini
agent_id: my_agent
# Inherits model and temperature from defaults

Custom Components

evaluators:
- type: custom
module_path: ./evaluators/safety.py
class_name: SafetyEvaluator
config:
strict_mode: true

Troubleshooting

API Key Not Found

Error: API key not found for provider: gemini

Solution:

# Set environment variable
export GEMINI_API_KEY=your_key

# Or create .env file
echo "GEMINI_API_KEY=your_key" > .env

Module Not Found

Error: Module not found: ./evaluators/safety.py

Solution: Ensure you're in the example directory:

pwd
# Should be: /path/to/examples/XX-example-name
cd examples/XX-example-name

Test File Not Found

Error: Test file not found: ./tests.json

Solution: Check file exists and path is correct:

ls sample.evalset.json
# Or check config for correct path
cat config.yaml | grep paths

Permission Denied

Error: Permission denied: ./run.sh

Solution: Make script executable:

chmod +x run.sh
./run.sh

Modifying Examples

To experiment with an example:

  1. Copy the example:
cp -r examples/01-gemini-agent my-experiment
cd my-experiment
  1. Modify configuration or tests:
# Edit test cases
vim sample.evalset.json

# Edit configuration
vim config.yaml
  1. Run with changes:
judge-llm run --config config.yaml
  1. Compare results:
diff my-experiment/results.json examples/01-gemini-agent/results.json

Learning Path

Beginner

  1. Start with Gemini Agent to understand basic setup
  2. Learn Default Config for reusable configurations
  3. Try Database Tracking for storing results

Intermediate

  1. Build Custom Evaluator for domain-specific checks
  2. Master Config Override for flexible testing
  3. Explore Safety Evaluation for multi-turn scenarios

Advanced

  1. Combine multiple examples into a comprehensive test suite
  2. Create custom providers and reporters
  3. Build CI/CD pipelines with Judge LLM

Next Steps

After completing examples:

Contributing Examples

Want to contribute an example? Follow these steps:

  1. Create directory:
mkdir examples/XX-your-example
  1. Add required files:
  • README.md - Description and instructions
  • config.yaml - Configuration
  • sample.evalset.json - Test cases
  • run.sh - Shell runner
  • run_evaluation.py - Python runner
  1. Document thoroughly:
  • Explain what the example demonstrates
  • List key concepts and learning objectives
  • Provide step-by-step instructions
  • Include expected output and troubleshooting
  1. Test completely:
cd examples/XX-your-example
judge-llm run --config config.yaml
python run_evaluation.py
  1. Submit pull request: Include the example in documentation updates