Cost Evaluator
Control evaluation budgets by setting maximum cost thresholds per test case.
Overview
The Cost Evaluator ensures your evaluation costs stay within budget by:
- Tracking actual API costs per test case
- Comparing against configured maximums
- Failing tests that exceed cost limits
- Monitoring cost efficiency
Key Features:
- Per-test-case cost limits
- Automatic cost tracking from providers
- Currency support
- Cost ratio analysis
- Budget enforcement
Configuration
Basic Configuration
evaluators:
- type: cost_evaluator
config:
max_cost_per_case: 0.10 # $0.10 per test case
Full Configuration
evaluators:
- type: cost_evaluator
enabled: true
config:
max_cost_per_case: 0.10 # Maximum cost in USD
currency: USD # Currency code
Cost Tracking
How Costs are Calculated
Providers automatically track costs:
- Token counting: Count input/output tokens
- Pricing lookup: Apply provider-specific rates
- Cost calculation: tokens × rate
- Total aggregation: Sum across all turns
Example (Gemini):
Input: 100 tokens × $0.00025/1K = $0.000025
Output: 50 tokens × $0.0005/1K = $0.000025
Total: $0.00005
Provider Costs
| Provider | Input (per 1K tokens) | Output (per 1K tokens) |
|---|---|---|
| Gemini Flash | $0.00025 | $0.0005 |
| Gemini Pro | $0.00125 | $0.005 |
| Mock | $0.00 | $0.00 |
| Custom | Varies | Varies |
Setting Cost Limits
By Test Complexity
# Simple Q&A
max_cost_per_case: 0.001 # $0.001
# Multi-turn conversations
max_cost_per_case: 0.01 # $0.01
# Complex reasoning
max_cost_per_case: 0.10 # $0.10
# Large document processing
max_cost_per_case: 1.00 # $1.00
By Budget
# Total budget: $10 for 100 test cases
max_cost_per_case: 0.10 # $10 / 100 = $0.10 per case
Usage Examples
Example 1: Basic Cost Control
# config.yaml
dataset:
loader: local_file
paths: [./tests.json]
providers:
- type: gemini
agent_id: cost_aware_agent
model: gemini-2.0-flash-exp
evaluators:
- type: cost_evaluator
config:
max_cost_per_case: 0.01
reporters:
- type: console
Run evaluation:
judge-llm run --config config.yaml
Example 2: Different Limits for Different Tests
{
"eval_id": "simple_qa_001",
"conversation": [...],
"evaluator_config": {
"CostEvaluator": {
"max_cost_per_case": 0.001 // Simple = cheap
}
}
},
{
"eval_id": "complex_reasoning_001",
"conversation": [...],
"evaluator_config": {
"CostEvaluator": {
"max_cost_per_case": 0.10 // Complex = higher budget
}
}
}
Example 3: Cost Comparison
Compare costs across providers:
providers:
- type: gemini
agent_id: gemini_flash
model: gemini-2.0-flash-exp
- type: gemini
agent_id: gemini_pro
model: gemini-1.5-pro
evaluators:
- type: cost_evaluator
config:
max_cost_per_case: 0.05
- type: response_evaluator
reporters:
- type: html
config:
output_file: cost_comparison.html
View HTML report to compare costs.
Example 4: Programmatic Cost Monitoring
from judge_llm import evaluate
report = evaluate(
dataset={"loader": "local_file", "paths": ["./tests.json"]},
providers=[{
"type": "gemini",
"agent_id": "test",
"model": "gemini-2.0-flash-exp"
}],
evaluators=[{
"type": "cost_evaluator",
"config": {"max_cost_per_case": 0.01}
}],
reporters=[{"type": "console"}]
)
# Check total costs
print(f"Total cost: ${report.total_cost:.4f}")
print(f"Avg cost per case: ${report.total_cost / len(report.test_cases):.4f}")
# Find expensive test cases
for case in report.test_cases:
if case.cost > 0.01:
print(f"{case.eval_id}: ${case.cost:.4f} (over budget!)")
Evaluation Result
The cost evaluator returns:
{
"evaluator_name": "CostEvaluator",
"evaluator_type": "cost_evaluator",
"passed": True,
"score": 1.0,
"threshold": 0.10,
"success": True,
"details": {
"actual_cost": 0.000234,
"max_cost": 0.10,
"currency": "USD",
"cost_ratio": 0.00234 # actual / max
}
}
Pass criteria: actual_cost ≤ max_cost_per_case
Failed Example
{
"passed": False,
"score": 0.0,
"details": {
"actual_cost": 0.15,
"max_cost": 0.10,
"currency": "USD",
"cost_ratio": 1.5 # 50% over budget
}
}
Cost Optimization Tips
1. Use Cheaper Models
# Expensive
model: gemini-1.5-pro
# Cheaper (5x less)
model: gemini-2.0-flash-exp
2. Limit Response Length
providers:
- type: gemini
model: gemini-2.0-flash-exp
max_tokens: 512 # Limit output tokens
3. Reduce Temperature
Lower temperature = more focused = fewer tokens:
temperature: 0.3 # More focused, cheaper
4. Use Mock for Development
# Development - Free
providers:
- type: mock
# Production - Real costs
providers:
- type: gemini
5. Batch Similar Tests
Group similar tests to benefit from caching (if provider supports it).
Best Practices
1. Set Realistic Limits
Start generous, then tighten:
# Week 1: Learn costs
max_cost_per_case: 1.00
# Week 2: Optimize based on data
max_cost_per_case: 0.10
# Week 3: Production limits
max_cost_per_case: 0.05
2. Monitor Trends
Track costs over time:
- Average cost per test case
- Total evaluation costs
- Cost by test case type
- Cost by provider
3. Use with Response Evaluator
Balance cost and quality:
evaluators:
- type: cost_evaluator
config: {max_cost_per_case: 0.01}
- type: response_evaluator
config: {similarity_threshold: 0.8}
Find the sweet spot: good quality at reasonable cost.
4. Budget Alerts
Set up alerts for cost overruns:
if report.total_cost > MONTHLY_BUDGET:
send_alert(f"Budget exceeded: ${report.total_cost}")
Troubleshooting
All Tests Failing Cost Check
Issue: Every test exceeds cost limit
Solutions:
-
Check limit is reasonable:
max_cost_per_case: 0.000001 # Too low!
max_cost_per_case: 0.01 # Better -
Use cheaper model:
model: gemini-2.0-flash-exp # Cheapest -
Reduce max_tokens:
max_tokens: 512 # Limit response length
Cost is Always $0.00
Issue: Provider not tracking costs
Solutions:
-
Check provider type:
- Mock provider: Always $0 (expected)
- Gemini provider: Should track costs
-
Verify provider implementation: Custom providers must calculate costs in
execute()
Unexpected High Costs
Issue: Costs higher than expected
Causes:
- Long conversations (many turns)
- Large input contexts
- High temperature (verbose responses)
- Wrong model selected
Debug:
# Check token usage
for case in report.test_cases:
print(f"{case.eval_id}:")
print(f" Tokens: {case.token_usage.total_tokens}")
print(f" Cost: ${case.cost:.4f}")
Related Documentation
API Reference
For implementation details, see the CostEvaluator API Reference.