Skip to main content

Providers Overview

Providers are the LLM backends that Judge LLM evaluates. Each provider implements the interface to a specific LLM service or agent framework, allowing you to test and compare different models systematically.

What is a Provider?

A provider is responsible for:

  • Executing test cases against an LLM or agent
  • Managing conversations with the model
  • Tracking metrics like cost, latency, and token usage
  • Returning results in a standardized format for evaluation

Available Providers

Judge LLM includes several built-in providers and supports custom implementations:

Built-in Providers

ProviderDescriptionUse Case
GeminiGoogle's Gemini modelsProduction LLM evaluation
MockTest provider (no API calls)Testing & development
Google ADKGoogle Agent Development KitLocal AI agents with tool use
ADK HTTPRemote ADK HTTP endpointsDeployed agents via HTTP/SSE
CustomYour own implementationAny LLM service

Provider Comparison

FeatureGeminiMockGoogle ADKADK HTTPCustom
API Calls✅ Yes❌ No✅ Yes✅ YesDepends
Cost Tracking✅ Yes✅ Simulated✅ Yes✅ YesOptional
Tool Calling✅ Yes❌ No✅ Yes✅ YesOptional
Authentication✅ API Key❌ None✅ API Key✅ MultipleDepends
Multi-turn✅ Yes✅ Yes✅ Yes✅ YesOptional
Multi-agent❌ No❌ No✅ Yes✅ YesOptional
SSE Streaming❌ No❌ No❌ No✅ YesOptional

Quick Start

Basic Configuration

providers:
- type: gemini
agent_id: my_gemini_agent
model: gemini-2.0-flash-exp

Multiple Providers (A/B Testing)

providers:
- type: gemini
agent_id: gemini_flash
model: gemini-2.0-flash-exp

- type: gemini
agent_id: gemini_pro
model: gemini-1.5-pro

- type: mock
agent_id: baseline

Provider Interface

All providers implement the same interface:

from judge_llm.providers.base import BaseProvider
from judge_llm.core.models import EvalCase, ProviderResult

class MyProvider(BaseProvider):
def execute(self, eval_case: EvalCase) -> ProviderResult:
"""Execute evaluation case and return results."""
pass

def cleanup(self):
"""Cleanup resources after evaluation."""
pass

Common Configuration

Required Fields

All providers require these fields:

providers:
- type: <provider_type> # Provider identifier
agent_id: <unique_id> # Your unique name for this configuration

Optional Metadata

Pass custom configuration to providers:

providers:
- type: gemini
agent_id: my_agent
# Custom fields passed to provider
temperature: 0.7
max_tokens: 2048
custom_setting: value

Provider Registry

Using Registered Providers

Register providers once in .judge_llm.defaults.yaml:

providers:
- type: custom
module_path: ./my_providers/anthropic.py
class_name: AnthropicProvider
register_as: anthropic # ← Register globally

# Then use by name in test configs
providers:
- type: anthropic # ← Use registered provider
agent_id: claude

Programmatic Registration

from judge_llm import register_provider
from my_providers import MyProvider

register_provider("my_provider", MyProvider)

Choosing a Provider

For Development & Testing

For Production LLM Testing

For Local AI Agents with Tools

For Remote/Deployed Agents

For Other LLM Services

Cost & Performance

Cost Tracking

Providers automatically track costs when available:

report = evaluate(config="config.yaml")
print(f"Total cost: ${report.total_cost:.4f}")

Performance Metrics

All providers track execution time:

for run in report.execution_runs:
print(f"Provider: {run.provider_type}")
print(f"Time: {run.provider_result.time_taken:.2f}s")
print(f"Cost: ${run.provider_result.cost:.4f}")

Provider Results

Standard Output Format

All providers return results in the same format:

ProviderResult(
conversation_history=[...], # Multi-turn conversations
cost=0.0234, # API cost in dollars
time_taken=1.23, # Execution time in seconds
token_usage={ # Token counts
"prompt_tokens": 150,
"completion_tokens": 50,
"total_tokens": 200
},
metadata={}, # Provider-specific metadata
success=True, # Execution status
error=None # Error message if failed
)

Best Practices

1. Use Consistent Agent IDs

# Good - Clear, descriptive IDs
providers:
- type: gemini
agent_id: gemini_flash_2025
- type: gemini
agent_id: gemini_pro_2025

# Avoid - Generic IDs
providers:
- type: gemini
agent_id: agent1
- type: gemini
agent_id: agent2

2. Environment Variables for API Keys

providers:
- type: gemini
agent_id: my_agent
api_key: ${GOOGLE_API_KEY} # ✅ From .env file
# api_key: "hardcoded" # ❌ Never hardcode keys

3. Provider-Specific Configs

# Separate configs for different models
providers:
- type: gemini
agent_id: fast_cheap
model: gemini-2.0-flash-exp
temperature: 0.0
max_tokens: 512

- type: gemini
agent_id: high_quality
model: gemini-1.5-pro
temperature: 0.7
max_tokens: 4096

4. Start with Mock Provider

# Test your evaluation logic first
providers:
- type: mock
agent_id: test_baseline

# Then switch to real provider
# providers:
# - type: gemini
# agent_id: production

Next Steps