Default Configurations
Learn how to use default configuration files to define reusable settings and custom component registrations.
Overview
Default configuration files allow you to:
- Define common settings once, use everywhere
- Register custom components globally
- Reduce duplication across test configs
- Maintain consistent configuration across projects
- Share team-wide defaults
Configuration Hierarchy
Judge LLM merges configuration from three sources (in order of precedence):
- Global defaults (
~/.judge_llm/defaults.yaml) - User-wide settings - Project defaults (
.judge_llm.defaults.yaml) - Project-specific settings - Test config (
test.yaml) - Test-specific settings
Values in later files override earlier ones.
Quick Start
1. Create Default Config
Create .judge_llm.defaults.yaml in your project root:
# .judge_llm.defaults.yaml
providers:
- type: gemini
model: gemini-2.0-flash-exp
temperature: 0.0
evaluators:
- type: response_evaluator
llm_provider: gemini
- type: cost_evaluator
max_cost: 0.05
reporters:
- type: console
2. Create Simple Test Config
Your test configs become much simpler:
# test.yaml
dataset:
loader: local_file
paths:
- ./tests.json
providers:
- agent_id: my_test # Other settings from defaults
3. Run Evaluation
judge-llm run --config test.yaml
The configuration is merged automatically!
Project Defaults
Place .judge_llm.defaults.yaml in your project root.
Basic Defaults
# .judge_llm.defaults.yaml
# Default provider settings
providers:
- type: gemini
model: gemini-2.0-flash-exp
temperature: 0.0
max_tokens: 1024
# Default evaluators
evaluators:
- type: response_evaluator
llm_provider: gemini
- type: cost_evaluator
max_cost: 0.05
- type: latency_evaluator
max_latency: 5.0
# Default reporters
reporters:
- type: console
- type: json
output_path: ./results/latest.json
Using Defaults in Tests
# test.yaml
dataset:
loader: local_file
paths: [./tests.json]
providers:
- agent_id: test_agent # Inherits type, model, temperature from defaults
Merged result:
dataset:
loader: local_file
paths: [./tests.json]
providers:
- type: gemini # From defaults
model: gemini-2.0-flash-exp # From defaults
temperature: 0.0 # From defaults
max_tokens: 1024 # From defaults
agent_id: test_agent # From test config
evaluators: # From defaults
- type: response_evaluator
llm_provider: gemini
- type: cost_evaluator
max_cost: 0.05
- type: latency_evaluator
max_latency: 5.0
reporters: # From defaults
- type: console
- type: json
output_path: ./results/latest.json
Global Defaults
Place defaults.yaml in ~/.judge_llm/ for user-wide settings.
Setup
mkdir -p ~/.judge_llm
vim ~/.judge_llm/defaults.yaml
Example Global Defaults
# ~/.judge_llm/defaults.yaml
# API keys (if not using .env)
providers:
- type: gemini
api_key: ${GEMINI_API_KEY}
temperature: 0.0
- type: openai
api_key: ${OPENAI_API_KEY}
temperature: 0.0
- type: anthropic
api_key: ${ANTHROPIC_API_KEY}
temperature: 0.0
# Always use console output
reporters:
- type: console
Overriding Defaults
Test configs can override any default value.
Override Provider Model
# .judge_llm.defaults.yaml
providers:
- type: gemini
model: gemini-2.0-flash-exp
temperature: 0.0
# test.yaml
providers:
- agent_id: test
model: gemini-pro # Override default model
Override Evaluators
# .judge_llm.defaults.yaml
evaluators:
- type: cost_evaluator
max_cost: 0.05
# test.yaml
evaluators:
- type: cost_evaluator
max_cost: 0.01 # Stricter cost limit for this test
Add to Defaults
# .judge_llm.defaults.yaml
reporters:
- type: console
# test.yaml
reporters:
- type: console
- type: html # Add HTML reporter to defaults
output_path: ./report.html
Custom Component Registration
Register custom components in defaults to use them by name across all tests.
Registering Custom Providers
# .judge_llm.defaults.yaml
providers:
- type: custom
module_path: ./providers/my_provider.py
class_name: MyCustomProvider
register_as: my_provider # ← Register globally
Use by name:
# test.yaml
providers:
- type: my_provider # ← Use by name
agent_id: test
Registering Custom Evaluators
# .judge_llm.defaults.yaml
evaluators:
- type: custom
module_path: ./evaluators/safety.py
class_name: SafetyEvaluator
register_as: safety
Use by name:
# test.yaml
evaluators:
- type: safety
- type: response_evaluator
Registering Custom Reporters
# .judge_llm.defaults.yaml
reporters:
- type: custom
module_path: ./reporters/slack.py
class_name: SlackReporter
register_as: slack
Use by name:
# test.yaml
reporters:
- type: slack
webhook_url: ${SLACK_WEBHOOK_URL}
Complete Registration Example
# .judge_llm.defaults.yaml
# Register custom provider
providers:
- type: custom
module_path: ./providers/custom_provider.py
class_name: CustomProvider
register_as: custom_provider
# Register custom evaluator
evaluators:
- type: custom
module_path: ./evaluators/safety.py
class_name: SafetyEvaluator
register_as: safety
- type: custom
module_path: ./evaluators/tone.py
class_name: ToneEvaluator
register_as: tone
# Register custom reporter
reporters:
- type: custom
module_path: ./reporters/csv_reporter.py
class_name: CSVReporter
register_as: csv
- type: custom
module_path: ./reporters/slack_reporter.py
class_name: SlackReporter
register_as: slack
Use everywhere:
# test.yaml
dataset:
loader: local_file
paths: [./tests.json]
providers:
- type: custom_provider # Registered name
agent_id: test
evaluators:
- type: safety # Registered name
- type: tone # Registered name
reporters:
- type: csv # Registered name
output_path: ./results.csv
- type: slack # Registered name
webhook_url: ${SLACK_WEBHOOK_URL}
Environment-Specific Defaults
Development Defaults
# .judge_llm.defaults.yaml (development)
providers:
- type: gemini
model: gemini-2.0-flash-exp
temperature: 0.0
evaluators:
- type: response_evaluator
- type: cost_evaluator
max_cost: 0.1 # More lenient for dev
reporters:
- type: console
Production Defaults
# .judge_llm.defaults.yaml (production)
providers:
- type: gemini
model: gemini-2.0-flash-exp
temperature: 0.0
evaluators:
- type: response_evaluator
- type: trajectory_evaluator
- type: cost_evaluator
max_cost: 0.01 # Stricter for prod
- type: latency_evaluator
max_latency: 3.0
reporters:
- type: console
- type: database
db_path: ./prod_results.db
- type: json
output_path: ./results/prod-${date}.json
Managing Multiple Environments
# Use different default files
cp .judge_llm.defaults.dev.yaml .judge_llm.defaults.yaml
judge-llm run --config test.yaml
cp .judge_llm.defaults.prod.yaml .judge_llm.defaults.yaml
judge-llm run --config test.yaml
Or use environment variable:
# Set environment
export JUDGE_LLM_ENV=production
# Load environment-specific defaults in Python
import os
from pathlib import Path
env = os.getenv('JUDGE_LLM_ENV', 'development')
defaults_file = f'.judge_llm.defaults.{env}.yaml'
# Your evaluation code...
Best Practices
1. Keep Defaults Generic
# Good - Generic defaults
providers:
- type: gemini
model: gemini-2.0-flash-exp
temperature: 0.0
# Bad - Test-specific in defaults
providers:
- type: gemini
agent_id: specific_test_agent # Too specific
2. Use Test Configs for Specifics
# .judge_llm.defaults.yaml - Generic
providers:
- type: gemini
model: gemini-2.0-flash-exp
# test.yaml - Specific
providers:
- agent_id: math_test_agent
- agent_id: language_test_agent
3. Document Your Defaults
# .judge_llm.defaults.yaml
# Default provider configuration
# Uses Gemini Flash for cost efficiency
providers:
- type: gemini
model: gemini-2.0-flash-exp
temperature: 0.0 # Deterministic for testing
# Standard evaluation criteria
evaluators:
- type: response_evaluator
- type: cost_evaluator
max_cost: 0.05 # Maximum $0.05 per test case
# Always output to console for immediate feedback
reporters:
- type: console
4. Version Control Defaults
git add .judge_llm.defaults.yaml
git commit -m "Add project defaults"
5. Separate Custom Components
project/
├── .judge_llm.defaults.yaml # References custom components
├── providers/
│ └── custom_provider.py
├── evaluators/
│ ├── safety.py
│ └── tone.py
└── reporters/
├── csv_reporter.py
└── slack_reporter.py
Common Patterns
Shared Team Defaults
# .judge_llm.defaults.yaml (checked into git)
# Team-wide provider settings
providers:
- type: gemini
model: gemini-2.0-flash-exp
temperature: 0.0
api_key: ${GEMINI_API_KEY} # Each dev sets their own
# Consistent evaluation criteria
evaluators:
- type: response_evaluator
- type: cost_evaluator
max_cost: 0.05
# Standard reporting
reporters:
- type: console
- type: database
db_path: ${DB_PATH:-./results.db}
Personal Global Defaults
# ~/.judge_llm/defaults.yaml (personal machine)
# Personal API keys
providers:
- type: gemini
api_key: ${GEMINI_API_KEY}
- type: openai
api_key: ${OPENAI_API_KEY}
# Always include console output
reporters:
- type: console
CI/CD Defaults
# .judge_llm.defaults.yaml (for CI/CD)
providers:
- type: gemini
model: gemini-2.0-flash-exp
temperature: 0.0
api_key: ${GEMINI_API_KEY} # From CI secrets
evaluators:
- type: response_evaluator
- type: cost_evaluator
max_cost: 0.01
- type: latency_evaluator
max_latency: 5.0
reporters:
- type: console
- type: json
output_path: ./ci_results.json
- type: html
output_path: ./ci_report.html
Troubleshooting
Defaults Not Loading
Issue: Defaults file exists but not being applied
Solutions:
- Check filename:
.judge_llm.defaults.yaml(note the leading dot) - Verify file location (project root or
~/.judge_llm/) - Check YAML syntax:
yamllint .judge_llm.defaults.yaml
Unexpected Values
Issue: Getting unexpected configuration values
Solution: Remember precedence order:
- Global defaults (lowest precedence)
- Project defaults
- Test config (highest precedence)
Check each file to see where value is defined.
Custom Component Not Found
Issue: Module not found: ./providers/custom.py
Solutions:
- Verify
module_pathis correct relative to project root - Ensure file exists:
ls ./providers/custom.py - Check Python path if using absolute imports
Registration Not Working
Issue: Custom component not available by registered name
Solutions:
- Ensure
register_asfield is present - Check registration happens in defaults file
- Verify component registration before use