Default Configuration

Learn how to use .judge_llm.defaults.yaml to define reusable defaults and keep your test configurations simple and maintainable.

Overview

Location: examples/02-default-config/

Difficulty: Beginner

What You'll Learn:

Creating default configuration files
Configuration merging behavior
Reducing duplication across multiple tests
Overriding defaults selectively
Best practices for maintaining defaults

Why Use Defaults?

Problem: Repetitive Configuration

Without defaults, every test needs full configuration:

# test1.yaml
providers:
  - type: gemini
    model: gemini-2.0-flash-exp
    temperature: 0.0
    agent_id: test1

evaluators:
  - type: response_evaluator
  - type: cost_evaluator
    max_cost: 0.05

reporters:
  - type: console

# test2.yaml (same configuration repeated)
providers:
  - type: gemini
    model: gemini-2.0-flash-exp
    temperature: 0.0
    agent_id: test2
# ... repeated configuration

Solution: Default Configuration

With defaults, tests only specify what's unique:

# .judge_llm.defaults.yaml (shared)
providers:
  - type: gemini
    model: gemini-2.0-flash-exp
    temperature: 0.0

evaluators:
  - type: response_evaluator
  - type: cost_evaluator
    max_cost: 0.05

# test1.yaml (simple!)
dataset:
  loader: local_file
  paths: [./test1.json]

providers:
  - agent_id: test1

# test2.yaml (simple!)
dataset:
  loader: local_file
  paths: [./test2.json]

providers:
  - agent_id: test2

Files

02-default-config/
├── .judge_llm.defaults.yaml   # Shared defaults
├── config.yaml                # Test-specific config
├── sample.evalset.json        # Test cases
├── run.sh                     # Runner script
├── run_evaluation.py          # Python runner
└── README.md                  # Instructions

Configuration

.judge_llm.defaults.yaml

# Default provider: Gemini Flash for cost efficiency
providers:
  - type: gemini
    model: gemini-2.0-flash-exp
    temperature: 0.0  # Deterministic responses

# Standard evaluation criteria
evaluators:
  - type: response_evaluator
    llm_provider: gemini
    config:
      similarity_threshold: 0.7

  - type: cost_evaluator
    config:
      max_cost_per_case: 0.05  # Max $0.05 per test

# Default output
reporters:
  - type: console

Key Points:

Defines common settings for all tests
Should be generic and reusable
Version controlled with your project
Can be overridden by test configs

config.yaml

dataset:
  loader: local_file
  paths:
    - ./sample.evalset.json

providers:
  - agent_id: test_agent
    # type, model, temperature inherited from defaults

Much simpler! Only specifies:

What test data to use
Agent identifier

Everything else comes from defaults.

How Configuration Merging Works

Judge LLM merges configuration from three sources (in order):

Global defaults (~/.judge_llm/defaults.yaml)
Project defaults (.judge_llm.defaults.yaml)
Test config (config.yaml)

Later values override earlier ones.

Merge Example

Defaults:

providers:
  - type: gemini
    model: gemini-2.0-flash-exp
    temperature: 0.0

evaluators:
  - type: response_evaluator

Test Config:

providers:
  - agent_id: my_agent
    temperature: 0.5  # Override

evaluators:
  - type: cost_evaluator  # Add

Merged Result:

providers:
  - type: gemini              # From defaults
    model: gemini-2.0-flash-exp  # From defaults
    temperature: 0.5          # From test (overridden)
    agent_id: my_agent        # From test

evaluators:
  - type: response_evaluator  # From defaults
  - type: cost_evaluator      # From test

Merge Rules

Scalar values (strings, numbers, booleans): Test overrides defaults
Lists: Merged (defaults + test items)
Dictionaries: Deep merge (recursive)

Running the Example

cd examples/02-default-config
judge-llm run --config config.yaml

The evaluator:

Finds .judge_llm.defaults.yaml in current or parent directory
Merges it with config.yaml
Runs evaluation with merged configuration

Expected Output

Starting evaluation...

Using defaults from: .judge_llm.defaults.yaml

Evaluation Progress:
  test_001: ✓ PASSED (cost: $0.0012, time: 1.2s)
    Response: ✓ PASSED (similarity: 0.85)
    Cost: ✓ PASSED ($0.0012 < $0.05)

  test_002: ✓ PASSED (cost: $0.0015, time: 1.4s)
    Response: ✓ PASSED (similarity: 0.91)
    Cost: ✓ PASSED ($0.0015 < $0.05)

Summary:
  Total Tests: 2
  Passed: 2
  Failed: 0
  Success Rate: 100.0%
  Total Cost: $0.0027
  Total Time: 2.6s

Benefits of Using Defaults

1. DRY (Don't Repeat Yourself)

Define common settings once, use everywhere.

Before:

10 test files × 50 lines = 500 lines of configuration

After:

1 defaults file (50 lines) + 10 test files (5 lines each) = 100 lines total

Result: 80% reduction in configuration code!

2. Consistency

All tests use the same:

Model versions
Temperature settings
Evaluation criteria
Reporting formats

3. Easy Updates

Change model globally:

# Edit .judge_llm.defaults.yaml once
providers:
  - type: gemini
    model: gemini-2.0-flash-exp  # Change this

All tests automatically use the new model!

4. Environment-Specific Defaults

# .judge_llm.defaults.yaml
providers:
  - type: gemini
    model: ${MODEL:-gemini-2.0-flash-exp}

# Development: fast, cheap model
export MODEL=gemini-1.5-flash
judge-llm run --config test.yaml

# Production: best model
export MODEL=gemini-1.5-pro
judge-llm run --config test.yaml

Overriding Defaults

Test configs can override any default value.

Override Provider Settings

# config.yaml
providers:
  - agent_id: my_agent
    temperature: 1.0  # Override default 0.0
    model: gemini-1.5-pro  # Override default model

Override Evaluator Settings

# config.yaml
evaluators:
  - type: cost_evaluator
    config:
      max_cost_per_case: 0.01  # Stricter than default 0.05

Add Additional Components

# config.yaml
reporters:
  # Console from defaults
  - type: json  # Add JSON reporter
    output_path: ./results.json
  - type: html  # Add HTML reporter
    output_path: ./report.html

Disable Default Components

# config.yaml
evaluators:
  - type: response_evaluator
    enabled: false  # Disable from defaults
  - type: trajectory_evaluator  # Add different evaluator

Advanced Patterns

Environment-Specific Defaults

# .judge_llm.defaults.dev.yaml (development)
providers:
  - type: gemini
    model: gemini-1.5-flash
    temperature: 0.0

reporters:
  - type: console

# .judge_llm.defaults.prod.yaml (production)
providers:
  - type: gemini
    model: gemini-1.5-pro
    temperature: 0.0

reporters:
  - type: database
    db_path: /var/lib/judge_llm/results.db

# Use different defaults
judge-llm run --config test.yaml --defaults .judge_llm.defaults.dev.yaml
judge-llm run --config test.yaml --defaults .judge_llm.defaults.prod.yaml

Team Defaults

# .judge_llm.defaults.yaml (team defaults)
providers:
  - type: gemini
    model: ${TEAM_MODEL:-gemini-2.0-flash-exp}

evaluators:
  - type: response_evaluator
  - type: cost_evaluator
    config:
      max_cost_per_case: ${MAX_COST:-0.05}

Each team member can customize via environment variables.

Custom Component Registration

# .judge_llm.defaults.yaml
evaluators:
  - type: custom
    module_path: ./evaluators/safety.py
    class_name: SafetyEvaluator
    register_as: safety  # Register globally
    config:
      strict_mode: true

Now all tests can use type: safety without specifying module_path!

Best Practices

1. Keep Defaults Generic

Good:

providers:
  - type: gemini
    model: gemini-2.0-flash-exp
    temperature: 0.0

Bad:

providers:
  - type: gemini
    agent_id: specific_test_123  # Too specific!

2. Document Your Defaults

# .judge_llm.defaults.yaml

# Default provider configuration
# Using Gemini Flash for cost-effectiveness
# Temperature 0.0 for deterministic results
providers:
  - type: gemini
    model: gemini-2.0-flash-exp
    temperature: 0.0

# Standard evaluation criteria
# Response quality + cost monitoring
evaluators:
  - type: response_evaluator
    config:
      similarity_threshold: 0.7  # 70% similarity required

  - type: cost_evaluator
    config:
      max_cost_per_case: 0.05  # Alert if test > $0.05

3. Version Control Defaults

git add .judge_llm.defaults.yaml
git commit -m "Add project-wide evaluation defaults"

Team members get consistent settings when they clone.

4. Use Environment Variables

providers:
  - type: gemini
    model: ${MODEL:-gemini-2.0-flash-exp}
    temperature: ${TEMPERATURE:-0.0}

Allows customization without editing files.

5. Separate by Environment

project/
├── .judge_llm.defaults.yaml          # Base defaults
├── .judge_llm.defaults.dev.yaml      # Dev overrides
├── .judge_llm.defaults.prod.yaml     # Prod overrides
└── .judge_llm.defaults.ci.yaml       # CI overrides

6. Test Your Defaults

# Verify merged configuration
judge-llm run --config test.yaml --dry-run --verbose

Shows the final merged configuration before running.

Troubleshooting

Defaults Not Found

Error: Defaults file not found: .judge_llm.defaults.yaml

Solution:

# Check current directory
ls -la .judge_llm.defaults.yaml

# Check parent directories
find . -name ".judge_llm.defaults.yaml"

# Specify explicitly
judge-llm run --config test.yaml --defaults path/to/defaults.yaml

Unexpected Merging

Problem: Test config not overriding defaults

Debug:

# Show merged config
judge-llm run --config test.yaml --show-config

# Show where each value comes from
judge-llm run --config test.yaml --trace-config

Environment Variables Not Resolved

Error: API key not found: ${GEMINI_API_KEY}

Solution:

# Set environment variable
export GEMINI_API_KEY=your_key

# Or use .env file
echo "GEMINI_API_KEY=your_key" > .env

Examples

Multiple Tests, One Default

# .judge_llm.defaults.yaml
providers:
  - type: gemini
    model: gemini-2.0-flash-exp

evaluators:
  - type: response_evaluator
  - type: cost_evaluator

# test_accuracy.yaml
dataset:
  loader: local_file
  paths: [./accuracy_tests.json]
providers:
  - agent_id: accuracy_test

# test_performance.yaml
dataset:
  loader: local_file
  paths: [./performance_tests.json]
providers:
  - agent_id: performance_test
evaluators:
  - type: latency_evaluator  # Additional evaluator

# test_cost.yaml
dataset:
  loader: local_file
  paths: [./cost_tests.json]
providers:
  - agent_id: cost_test
evaluators:
  - type: cost_evaluator
    config:
      max_cost_per_case: 0.01  # Stricter

All three tests share common config but customize as needed.

Next Steps

After mastering defaults:

Config Override - Per-test evaluator overrides
Custom Evaluator - Register custom components
Safety Evaluation - Complex multi-test setups

Overview​

Why Use Defaults?​

Problem: Repetitive Configuration​

Solution: Default Configuration​

Files​

Configuration​

.judge_llm.defaults.yaml​

config.yaml​

How Configuration Merging Works​

Merge Example​

Merge Rules​

Running the Example​

Expected Output​

Benefits of Using Defaults​

1. DRY (Don't Repeat Yourself)​

2. Consistency​

3. Easy Updates​

4. Environment-Specific Defaults​

Overriding Defaults​

Override Provider Settings​

Override Evaluator Settings​

Add Additional Components​

Disable Default Components​

Advanced Patterns​

Environment-Specific Defaults​

Team Defaults​

Custom Component Registration​

Best Practices​

1. Keep Defaults Generic​

2. Document Your Defaults​

3. Version Control Defaults​

4. Use Environment Variables​

5. Separate by Environment​

6. Test Your Defaults​

Troubleshooting​

Defaults Not Found​

Unexpected Merging​

Environment Variables Not Resolved​

Examples​

Multiple Tests, One Default​

Next Steps​

Related Documentation​