Providers Overview

Providers are the LLM backends that Judge LLM evaluates. Each provider implements the interface to a specific LLM service or agent framework, allowing you to test and compare different models systematically.

What is a Provider?

A provider is responsible for:

Executing test cases against an LLM or agent
Managing conversations with the model
Tracking metrics like cost, latency, and token usage
Returning results in a standardized format for evaluation

Available Providers

Judge LLM includes several built-in providers and supports custom implementations:

Built-in Providers

Provider	Description	Use Case
Gemini	Google's Gemini models	Production LLM evaluation
Mock	Test provider (no API calls)	Testing & development
Google ADK	Google Agent Development Kit	Local AI agents with tool use
ADK HTTP	Remote ADK HTTP endpoints	Deployed agents via HTTP/SSE
Custom	Your own implementation	Any LLM service

Provider Comparison

Feature	Gemini	Mock	Google ADK	ADK HTTP	Custom
API Calls	✅ Yes	❌ No	✅ Yes	✅ Yes	Depends
Cost Tracking	✅ Yes	✅ Simulated	✅ Yes	✅ Yes	Optional
Tool Calling	✅ Yes	❌ No	✅ Yes	✅ Yes	Optional
Authentication	✅ API Key	❌ None	✅ API Key	✅ Multiple	Depends
Multi-turn	✅ Yes	✅ Yes	✅ Yes	✅ Yes	Optional
Multi-agent	❌ No	❌ No	✅ Yes	✅ Yes	Optional
SSE Streaming	❌ No	❌ No	❌ No	✅ Yes	Optional

Quick Start

Basic Configuration

providers:
  - type: gemini
    agent_id: my_gemini_agent
    model: gemini-2.0-flash-exp

Multiple Providers (A/B Testing)

providers:
  - type: gemini
    agent_id: gemini_flash
    model: gemini-2.0-flash-exp

  - type: gemini
    agent_id: gemini_pro
    model: gemini-1.5-pro

  - type: mock
    agent_id: baseline

Provider Interface

All providers implement the same interface:

from judge_llm.providers.base import BaseProvider
from judge_llm.core.models import EvalCase, ProviderResult

class MyProvider(BaseProvider):
    def execute(self, eval_case: EvalCase) -> ProviderResult:
        """Execute evaluation case and return results."""
        pass

    def cleanup(self):
        """Cleanup resources after evaluation."""
        pass

Common Configuration

Required Fields

All providers require these fields:

providers:
  - type: <provider_type>  # Provider identifier
    agent_id: <unique_id>  # Your unique name for this configuration

Optional Metadata

Pass custom configuration to providers:

providers:
  - type: gemini
    agent_id: my_agent
    # Custom fields passed to provider
    temperature: 0.7
    max_tokens: 2048
    custom_setting: value

Provider Registry

Using Registered Providers

providers:
  - type: custom
    module_path: ./my_providers/anthropic.py
    class_name: AnthropicProvider
    register_as: anthropic  # ← Register globally

# Then use by name in test configs
providers:
  - type: anthropic  # ← Use registered provider
    agent_id: claude

Programmatic Registration

from judge_llm import register_provider
from my_providers import MyProvider

register_provider("my_provider", MyProvider)

Choosing a Provider

For Development & Testing

Use Mock Provider - No API costs, instant execution

For Production LLM Testing

Use Gemini Provider - Google's production models

For Local AI Agents with Tools

Use Google ADK Provider - Full agent capabilities with local code

For Remote/Deployed Agents

Use ADK HTTP Provider - Connect to agents via HTTP/SSE

For Other LLM Services

Implement Custom Provider - OpenAI, Anthropic, etc.

Cost & Performance

Cost Tracking

Providers automatically track costs when available:

report = evaluate(config="config.yaml")
print(f"Total cost: ${report.total_cost:.4f}")

Performance Metrics

All providers track execution time:

for run in report.execution_runs:
    print(f"Provider: {run.provider_type}")
    print(f"Time: {run.provider_result.time_taken:.2f}s")
    print(f"Cost: ${run.provider_result.cost:.4f}")

Provider Results

Standard Output Format

All providers return results in the same format:

ProviderResult(
    conversation_history=[...],  # Multi-turn conversations
    cost=0.0234,                # API cost in dollars
    time_taken=1.23,            # Execution time in seconds
    token_usage={               # Token counts
        "prompt_tokens": 150,
        "completion_tokens": 50,
        "total_tokens": 200
    },
    metadata={},                # Provider-specific metadata
    success=True,               # Execution status
    error=None                  # Error message if failed
)

Best Practices

1. Use Consistent Agent IDs

# Good - Clear, descriptive IDs
providers:
  - type: gemini
    agent_id: gemini_flash_2025
  - type: gemini
    agent_id: gemini_pro_2025

# Avoid - Generic IDs
providers:
  - type: gemini
    agent_id: agent1
  - type: gemini
    agent_id: agent2

2. Environment Variables for API Keys

providers:
  - type: gemini
    agent_id: my_agent
    api_key: ${GOOGLE_API_KEY}  # ✅ From .env file
    # api_key: "hardcoded"      # ❌ Never hardcode keys

3. Provider-Specific Configs

# Separate configs for different models
providers:
  - type: gemini
    agent_id: fast_cheap
    model: gemini-2.0-flash-exp
    temperature: 0.0
    max_tokens: 512

  - type: gemini
    agent_id: high_quality
    model: gemini-1.5-pro
    temperature: 0.7
    max_tokens: 4096

4. Start with Mock Provider

# Test your evaluation logic first
providers:
  - type: mock
    agent_id: test_baseline

# Then switch to real provider
# providers:
#   - type: gemini
#     agent_id: production

Next Steps

Gemini Provider - Configure Google's Gemini models
Mock Provider - Set up test provider
Google ADK Provider - Build local AI agents
ADK HTTP Provider - Connect to remote agents
Custom Providers - Implement your own

Configuration Guide - Complete config reference
Python API - Programmatic usage
Examples - Working examples

What is a Provider?​

Available Providers​

Built-in Providers​

Provider Comparison​

Quick Start​

Basic Configuration​

Multiple Providers (A/B Testing)​

Provider Interface​

Common Configuration​

Required Fields​

Optional Metadata​

Provider Registry​

Using Registered Providers​

Programmatic Registration​

Choosing a Provider​

For Development & Testing​

For Production LLM Testing​

For Local AI Agents with Tools​

For Remote/Deployed Agents​

For Other LLM Services​

Cost & Performance​

Cost Tracking​

Performance Metrics​

Provider Results​

Standard Output Format​

Best Practices​

1. Use Consistent Agent IDs​

2. Environment Variables for API Keys​

3. Provider-Specific Configs​

4. Start with Mock Provider​

Next Steps​

Related Documentation​