Default Configuration
Learn how to use .judge_llm.defaults.yaml to define reusable defaults and keep your test configurations simple and maintainable.
Overview
Location: examples/02-default-config/
Difficulty: Beginner
What You'll Learn:
- Creating default configuration files
- Configuration merging behavior
- Reducing duplication across multiple tests
- Overriding defaults selectively
- Best practices for maintaining defaults
Why Use Defaults?
Problem: Repetitive Configuration
Without defaults, every test needs full configuration:
# test1.yaml
providers:
- type: gemini
model: gemini-2.0-flash-exp
temperature: 0.0
agent_id: test1
evaluators:
- type: response_evaluator
- type: cost_evaluator
max_cost: 0.05
reporters:
- type: console
# test2.yaml (same configuration repeated)
providers:
- type: gemini
model: gemini-2.0-flash-exp
temperature: 0.0
agent_id: test2
# ... repeated configuration
Solution: Default Configuration
With defaults, tests only specify what's unique:
# .judge_llm.defaults.yaml (shared)
providers:
- type: gemini
model: gemini-2.0-flash-exp
temperature: 0.0
evaluators:
- type: response_evaluator
- type: cost_evaluator
max_cost: 0.05
# test1.yaml (simple!)
dataset:
loader: local_file
paths: [./test1.json]
providers:
- agent_id: test1
# test2.yaml (simple!)
dataset:
loader: local_file
paths: [./test2.json]
providers:
- agent_id: test2
Files
02-default-config/
├── .judge_llm.defaults.yaml # Shared defaults
├── config.yaml # Test-specific config
├── sample.evalset.json # Test cases
├── run.sh # Runner script
├── run_evaluation.py # Python runner
└── README.md # Instructions
Configuration
.judge_llm.defaults.yaml
# Default provider: Gemini Flash for cost efficiency
providers:
- type: gemini
model: gemini-2.0-flash-exp
temperature: 0.0 # Deterministic responses
# Standard evaluation criteria
evaluators:
- type: response_evaluator
llm_provider: gemini
config:
similarity_threshold: 0.7
- type: cost_evaluator
config:
max_cost_per_case: 0.05 # Max $0.05 per test
# Default output
reporters:
- type: console
Key Points:
- Defines common settings for all tests
- Should be generic and reusable
- Version controlled with your project
- Can be overridden by test configs
config.yaml
dataset:
loader: local_file
paths:
- ./sample.evalset.json
providers:
- agent_id: test_agent
# type, model, temperature inherited from defaults
Much simpler! Only specifies:
- What test data to use
- Agent identifier
Everything else comes from defaults.
How Configuration Merging Works
Judge LLM merges configuration from three sources (in order):
- Global defaults (
~/.judge_llm/defaults.yaml) - Project defaults (
.judge_llm.defaults.yaml) - Test config (
config.yaml)
Later values override earlier ones.
Merge Example
Defaults:
providers:
- type: gemini
model: gemini-2.0-flash-exp
temperature: 0.0
evaluators:
- type: response_evaluator
Test Config:
providers:
- agent_id: my_agent
temperature: 0.5 # Override
evaluators:
- type: cost_evaluator # Add
Merged Result:
providers:
- type: gemini # From defaults
model: gemini-2.0-flash-exp # From defaults
temperature: 0.5 # From test (overridden)
agent_id: my_agent # From test
evaluators:
- type: response_evaluator # From defaults
- type: cost_evaluator # From test
Merge Rules
- Scalar values (strings, numbers, booleans): Test overrides defaults
- Lists: Merged (defaults + test items)
- Dictionaries: Deep merge (recursive)
Running the Example
cd examples/02-default-config
judge-llm run --config config.yaml
The evaluator:
- Finds
.judge_llm.defaults.yamlin current or parent directory - Merges it with
config.yaml - Runs evaluation with merged configuration
Expected Output
Starting evaluation...
Using defaults from: .judge_llm.defaults.yaml
Evaluation Progress:
test_001: ✓ PASSED (cost: $0.0012, time: 1.2s)
Response: ✓ PASSED (similarity: 0.85)
Cost: ✓ PASSED ($0.0012 < $0.05)
test_002: ✓ PASSED (cost: $0.0015, time: 1.4s)
Response: ✓ PASSED (similarity: 0.91)
Cost: ✓ PASSED ($0.0015 < $0.05)
Summary:
Total Tests: 2
Passed: 2
Failed: 0
Success Rate: 100.0%
Total Cost: $0.0027
Total Time: 2.6s
Benefits of Using Defaults
1. DRY (Don't Repeat Yourself)
Define common settings once, use everywhere.
Before:
- 10 test files × 50 lines = 500 lines of configuration
After:
- 1 defaults file (50 lines) + 10 test files (5 lines each) = 100 lines total
Result: 80% reduction in configuration code!
2. Consistency
All tests use the same:
- Model versions
- Temperature settings
- Evaluation criteria
- Reporting formats
3. Easy Updates
Change model globally:
# Edit .judge_llm.defaults.yaml once
providers:
- type: gemini
model: gemini-2.0-flash-exp # Change this
All tests automatically use the new model!
4. Environment-Specific Defaults
# .judge_llm.defaults.yaml
providers:
- type: gemini
model: ${MODEL:-gemini-2.0-flash-exp}
# Development: fast, cheap model
export MODEL=gemini-1.5-flash
judge-llm run --config test.yaml
# Production: best model
export MODEL=gemini-1.5-pro
judge-llm run --config test.yaml
Overriding Defaults
Test configs can override any default value.
Override Provider Settings
# config.yaml
providers:
- agent_id: my_agent
temperature: 1.0 # Override default 0.0
model: gemini-1.5-pro # Override default model
Override Evaluator Settings
# config.yaml
evaluators:
- type: cost_evaluator
config:
max_cost_per_case: 0.01 # Stricter than default 0.05
Add Additional Components
# config.yaml
reporters:
# Console from defaults
- type: json # Add JSON reporter
output_path: ./results.json
- type: html # Add HTML reporter
output_path: ./report.html
Disable Default Components
# config.yaml
evaluators:
- type: response_evaluator
enabled: false # Disable from defaults
- type: trajectory_evaluator # Add different evaluator
Advanced Patterns
Environment-Specific Defaults
# .judge_llm.defaults.dev.yaml (development)
providers:
- type: gemini
model: gemini-1.5-flash
temperature: 0.0
reporters:
- type: console
# .judge_llm.defaults.prod.yaml (production)
providers:
- type: gemini
model: gemini-1.5-pro
temperature: 0.0
reporters:
- type: database
db_path: /var/lib/judge_llm/results.db
# Use different defaults
judge-llm run --config test.yaml --defaults .judge_llm.defaults.dev.yaml
judge-llm run --config test.yaml --defaults .judge_llm.defaults.prod.yaml
Team Defaults
# .judge_llm.defaults.yaml (team defaults)
providers:
- type: gemini
model: ${TEAM_MODEL:-gemini-2.0-flash-exp}
evaluators:
- type: response_evaluator
- type: cost_evaluator
config:
max_cost_per_case: ${MAX_COST:-0.05}
Each team member can customize via environment variables.
Custom Component Registration
# .judge_llm.defaults.yaml
evaluators:
- type: custom
module_path: ./evaluators/safety.py
class_name: SafetyEvaluator
register_as: safety # Register globally
config:
strict_mode: true
Now all tests can use type: safety without specifying module_path!
Best Practices
1. Keep Defaults Generic
Good:
providers:
- type: gemini
model: gemini-2.0-flash-exp
temperature: 0.0
Bad:
providers:
- type: gemini
agent_id: specific_test_123 # Too specific!
2. Document Your Defaults
# .judge_llm.defaults.yaml
# Default provider configuration
# Using Gemini Flash for cost-effectiveness
# Temperature 0.0 for deterministic results
providers:
- type: gemini
model: gemini-2.0-flash-exp
temperature: 0.0
# Standard evaluation criteria
# Response quality + cost monitoring
evaluators:
- type: response_evaluator
config:
similarity_threshold: 0.7 # 70% similarity required
- type: cost_evaluator
config:
max_cost_per_case: 0.05 # Alert if test > $0.05
3. Version Control Defaults
git add .judge_llm.defaults.yaml
git commit -m "Add project-wide evaluation defaults"
Team members get consistent settings when they clone.
4. Use Environment Variables
providers:
- type: gemini
model: ${MODEL:-gemini-2.0-flash-exp}
temperature: ${TEMPERATURE:-0.0}
Allows customization without editing files.
5. Separate by Environment
project/
├── .judge_llm.defaults.yaml # Base defaults
├── .judge_llm.defaults.dev.yaml # Dev overrides
├── .judge_llm.defaults.prod.yaml # Prod overrides
└── .judge_llm.defaults.ci.yaml # CI overrides
6. Test Your Defaults
# Verify merged configuration
judge-llm run --config test.yaml --dry-run --verbose
Shows the final merged configuration before running.
Troubleshooting
Defaults Not Found
Error: Defaults file not found: .judge_llm.defaults.yaml
Solution:
# Check current directory
ls -la .judge_llm.defaults.yaml
# Check parent directories
find . -name ".judge_llm.defaults.yaml"
# Specify explicitly
judge-llm run --config test.yaml --defaults path/to/defaults.yaml
Unexpected Merging
Problem: Test config not overriding defaults
Debug:
# Show merged config
judge-llm run --config test.yaml --show-config
# Show where each value comes from
judge-llm run --config test.yaml --trace-config
Environment Variables Not Resolved
Error: API key not found: ${GEMINI_API_KEY}
Solution:
# Set environment variable
export GEMINI_API_KEY=your_key
# Or use .env file
echo "GEMINI_API_KEY=your_key" > .env
Examples
Multiple Tests, One Default
# .judge_llm.defaults.yaml
providers:
- type: gemini
model: gemini-2.0-flash-exp
evaluators:
- type: response_evaluator
- type: cost_evaluator
# test_accuracy.yaml
dataset:
loader: local_file
paths: [./accuracy_tests.json]
providers:
- agent_id: accuracy_test
# test_performance.yaml
dataset:
loader: local_file
paths: [./performance_tests.json]
providers:
- agent_id: performance_test
evaluators:
- type: latency_evaluator # Additional evaluator
# test_cost.yaml
dataset:
loader: local_file
paths: [./cost_tests.json]
providers:
- agent_id: cost_test
evaluators:
- type: cost_evaluator
config:
max_cost_per_case: 0.01 # Stricter
All three tests share common config but customize as needed.
Next Steps
After mastering defaults:
- Config Override - Per-test evaluator overrides
- Custom Evaluator - Register custom components
- Safety Evaluation - Complex multi-test setups