Examples Overview
Learn by example with comprehensive tutorials covering common Judge LLM use cases. Each example demonstrates specific features and best practices.
Quick Navigation
| Example | Focus | Difficulty | Key Concepts |
|---|---|---|---|
| Gemini Agent | Basic setup | Beginner | Configuration, providers, evaluators |
| Default Config | Defaults | Beginner | Default config, config merging |
| Custom Evaluator | Custom evaluator | Intermediate | Custom components, registration |
| Safety Evaluation | Multi-turn | Advanced | Long conversations, safety, trajectory |
| Config Override | Config override | Intermediate | Per-test config, thresholds |
| Database Tracking | Database | Intermediate | SQLite, querying, trends |
By Category
Getting Started
Perfect for beginners learning Judge LLM basics:
- Gemini Agent - Start here! Basic Gemini agent evaluation
- Default Config - Reusable defaults and configuration merging
Custom Components
Learn to extend Judge LLM with custom implementations:
- Custom Evaluator - Build domain-specific evaluators
- Safety Evaluation - Multi-turn safety checks
Advanced Configuration
Master configuration patterns and overrides:
- Default Config - Default configuration system
- Config Override - Per-test configuration overrides
Data & Reporting
Store and analyze evaluation results:
- Database Tracking - SQLite storage and querying
Running Examples
Prerequisites
All examples require:
- Judge LLM installed:
pip install judge-llm
- API keys configured:
# Create .env file
echo "GEMINI_API_KEY=your_key" > .env
echo "OPENAI_API_KEY=your_key" >> .env
- Navigate to example:
cd examples/01-gemini-agent
Running an Example
Each example can be run with:
judge-llm run --config config.yaml
Or using the Python API:
python run_evaluation.py
Expected Output
Typical output looks like:
Starting evaluation...
Evaluation Progress:
test_001: ✓ PASSED (cost: $0.0012, time: 1.2s)
test_002: ✓ PASSED (cost: $0.0015, time: 1.5s)
test_003: ✗ FAILED (cost: $0.0010, time: 0.8s)
Summary:
Total Tests: 3
Passed: 2
Failed: 1
Success Rate: 66.7%
Total Cost: $0.0037
Total Time: 3.5s
Example Structure
Each example includes:
XX-example-name/
├── README.md # Detailed explanation
├── config.yaml # Configuration file
├── sample.evalset.json # Test cases
├── run.sh # Shell script runner
└── run_evaluation.py # Python API runner
Common Patterns
Basic Configuration
dataset:
loader: local_file
paths: [./sample.evalset.json]
providers:
- type: gemini
agent_id: my_agent
model: gemini-2.0-flash-exp
evaluators:
- type: response_evaluator
reporters:
- type: console
Using Defaults
# .judge_llm.defaults.yaml
providers:
- type: gemini
model: gemini-2.0-flash-exp
temperature: 0.7
evaluators:
- type: response_evaluator
- type: cost_evaluator
# test.yaml (overrides defaults)
dataset:
loader: local_file
paths: [./tests.json]
providers:
- type: gemini
agent_id: my_agent
# Inherits model and temperature from defaults
Custom Components
evaluators:
- type: custom
module_path: ./evaluators/safety.py
class_name: SafetyEvaluator
config:
strict_mode: true
Troubleshooting
API Key Not Found
Error: API key not found for provider: gemini
Solution:
# Set environment variable
export GEMINI_API_KEY=your_key
# Or create .env file
echo "GEMINI_API_KEY=your_key" > .env
Module Not Found
Error: Module not found: ./evaluators/safety.py
Solution: Ensure you're in the example directory:
pwd
# Should be: /path/to/examples/XX-example-name
cd examples/XX-example-name
Test File Not Found
Error: Test file not found: ./tests.json
Solution: Check file exists and path is correct:
ls sample.evalset.json
# Or check config for correct path
cat config.yaml | grep paths
Permission Denied
Error: Permission denied: ./run.sh
Solution: Make script executable:
chmod +x run.sh
./run.sh
Modifying Examples
To experiment with an example:
- Copy the example:
cp -r examples/01-gemini-agent my-experiment
cd my-experiment
- Modify configuration or tests:
# Edit test cases
vim sample.evalset.json
# Edit configuration
vim config.yaml
- Run with changes:
judge-llm run --config config.yaml
- Compare results:
diff my-experiment/results.json examples/01-gemini-agent/results.json
Learning Path
Beginner
- Start with Gemini Agent to understand basic setup
- Learn Default Config for reusable configurations
- Try Database Tracking for storing results
Intermediate
- Build Custom Evaluator for domain-specific checks
- Master Config Override for flexible testing
- Explore Safety Evaluation for multi-turn scenarios
Advanced
- Combine multiple examples into a comprehensive test suite
- Create custom providers and reporters
- Build CI/CD pipelines with Judge LLM
Next Steps
After completing examples:
- Read User Guides for comprehensive documentation
- Explore Evaluators for evaluation options
- Learn about Reporters for output formats
- Check Python API for programmatic usage
Contributing Examples
Want to contribute an example? Follow these steps:
- Create directory:
mkdir examples/XX-your-example
- Add required files:
README.md- Description and instructionsconfig.yaml- Configurationsample.evalset.json- Test casesrun.sh- Shell runnerrun_evaluation.py- Python runner
- Document thoroughly:
- Explain what the example demonstrates
- List key concepts and learning objectives
- Provide step-by-step instructions
- Include expected output and troubleshooting
- Test completely:
cd examples/XX-your-example
judge-llm run --config config.yaml
python run_evaluation.py
- Submit pull request: Include the example in documentation updates