Skip to main content

Mock Provider

The Mock Provider is a built-in test provider that returns expected responses without making any API calls. Perfect for testing, development, and CI/CD environments where you want fast feedback without API costs.

Overview

Type: mock

Purpose: Test evaluation logic and framework functionality without external API dependencies.

Key Features:

  • ✅ Zero API calls - instant execution
  • ✅ No authentication required
  • ✅ Mock cost and token tracking
  • ✅ Supports all evaluation types
  • ✅ Perfect for development and testing
  • ✅ No rate limits
  • ✅ Deterministic results

Quick Start

Basic Configuration

providers:
- type: mock
agent_id: test_agent

That's it! No API keys or additional configuration needed.

Run Evaluation

judge-llm run --config config.yaml
# Completes instantly with no API costs

How It Works

The Mock Provider:

  1. Reads expected responses from your test cases
  2. Returns them as-is (no LLM call)
  3. Calculates mock costs based on text length
  4. Tracks execution time (near-zero)

Example Flow

Input (evalset.yaml):

eval_cases:
- eval_id: test_1
conversation:
- invocation_id: turn_1
user_content:
parts:
- text: "What is 2+2?"
final_response:
parts:
- text: "The answer is 4" # Expected response

Output:

✓ test_1: Response matches expected
Cost: $0.0001 (mocked)
Time: 0.001s

The Mock Provider returns "The answer is 4" exactly as specified in the test case.

Configuration Options

Minimal Configuration

providers:
- type: mock
agent_id: baseline

With Metadata (Optional)

providers:
- type: mock
agent_id: test_baseline
metadata:
description: "Test baseline for development"
version: "1.0"

Configuration Reference

OptionTypeDefaultDescription
typestring-Must be mock
agent_idstring-Unique identifier
metadataobject{}Optional metadata

Use Cases

1. Development & Testing

Test your evaluation logic before using real LLMs:

# Start with mock provider
providers:
- type: mock
agent_id: dev_baseline

# Later switch to real provider
# providers:
# - type: gemini
# agent_id: production

2. CI/CD Pipelines

Run tests in CI without API costs:

# .github/workflows/test.yml
agent:
fail_on_threshold_violation: true # Fail CI on quality issues

providers:
- type: mock
agent_id: ci_baseline

evaluators:
- type: response_evaluator
config:
similarity_threshold: 1.0 # Exact match required

3. Framework Development

Test Judge LLM features:

providers:
- type: mock
agent_id: feature_test

# Test parallel execution
agent:
parallel_execution: true
max_workers: 8

4. Evaluator Development

Test custom evaluators without API calls:

providers:
- type: mock
agent_id: evaluator_test

evaluators:
- type: custom
module_path: ./my_evaluators/new_evaluator.py
class_name: NewEvaluator

5. Baseline Comparison

Compare real LLM against expected responses:

providers:
# Mock provider as baseline (expected responses)
- type: mock
agent_id: expected_baseline

# Real LLM for comparison
- type: gemini
agent_id: actual_model

Mock Cost Calculation

The Mock Provider simulates costs for testing:

# Mock cost formula
total_tokens = len(prompt_text) + len(response_text)
mock_cost = total_tokens * 0.00001 # $0.00001 per token

Viewing Mock Costs

from judge_llm import evaluate

report = evaluate(config="config.yaml")

# Total mock cost
print(f"Mock cost: ${report.total_cost:.6f}")

# Per-case costs
for run in report.execution_runs:
if run.provider_type == "mock":
print(f"{run.eval_case_id}: ${run.provider_result.cost:.6f}")

Mock Token Usage

Token counts are simulated based on text length:

for run in report.execution_runs:
tokens = run.provider_result.token_usage
print(f"Prompt: {tokens['prompt_tokens']}")
print(f"Completion: {tokens['completion_tokens']}")
print(f"Total: {tokens['total_tokens']}")

Multi-turn Conversations

Mock Provider supports multi-turn conversations:

eval_cases:
- eval_id: multi_turn
conversation:
# Turn 1
- invocation_id: turn_1
user_content:
parts:
- text: "Hi"
final_response:
parts:
- text: "Hello!"

# Turn 2
- invocation_id: turn_2
user_content:
parts:
- text: "How are you?"
final_response:
parts:
- text: "I'm doing well, thanks!"

Each turn is returned exactly as specified.

Performance

Execution Speed

# Mock Provider: ~0.001s per test case
# Gemini Provider: ~1-3s per test case

# 100 test cases:
# Mock: ~0.1s total
# Gemini: ~100-300s total

Parallel Execution

Mock Provider works great with parallel execution:

agent:
parallel_execution: true
max_workers: 16 # High parallelism possible (no rate limits)

providers:
- type: mock
agent_id: parallel_test

Examples

Example 1: Quick Test

# config.yaml
dataset:
loader: local_file
paths: [./tests.yaml]

providers:
- type: mock
agent_id: quick_test

evaluators:
- type: response_evaluator
# Runs instantly
judge-llm run --config config.yaml

Example 2: CI/CD Integration

# ci-config.yaml
agent:
fail_on_threshold_violation: true # Fail build on errors
parallel_execution: true
max_workers: 8

dataset:
loader: local_file
paths: [./tests/*.yaml]

providers:
- type: mock
agent_id: ci_test

evaluators:
- type: response_evaluator
config:
similarity_threshold: 1.0 # Require exact match

- type: trajectory_evaluator
config:
sequence_match_type: exact

reporters:
- type: json
output_path: ./test-results.json

Example 3: Baseline vs Real LLM

providers:
# Expected responses (baseline)
- type: mock
agent_id: expected

# Actual LLM output
- type: gemini
agent_id: gemini_flash
model: gemini-2.0-flash-exp

# Compare both against expected responses
evaluators:
- type: response_evaluator
config:
similarity_threshold: 0.85

Limitations

What Mock Provider Doesn't Do

No Real LLM Calls

  • Returns expected responses only
  • No actual model inference

No Tool Calling

  • Doesn't simulate function calls
  • Returns static responses

No Variability

  • Always returns the same response
  • No temperature/randomness

No Context Building

  • Doesn't maintain conversation state
  • Each turn is independent

When NOT to Use Mock Provider

Don't use Mock Provider for:

  • Production Testing - Use real LLM providers
  • Response Quality - Can't test actual LLM behavior
  • Prompt Engineering - No real model to test prompts
  • Tool/Function Testing - Use Google ADK or real providers

Testing Strategy

  1. Start with Mock - Validate test cases and evaluation logic
  2. Switch to Real - Test actual LLM behavior
  3. Use Both - Compare expected vs actual
# Phase 1: Validate tests with mock
providers:
- type: mock
agent_id: validation

# Phase 2: Test real LLM
# providers:
# - type: gemini
# agent_id: real_test

# Phase 3: Compare both
# providers:
# - type: mock
# agent_id: expected
# - type: gemini
# agent_id: actual

Advantages

✅ Speed

  • Instant execution (no API latency)
  • High parallelism (no rate limits)
  • Fast iteration cycles

✅ Cost

  • Zero API costs
  • Unlimited test runs
  • Perfect for CI/CD

✅ Reliability

  • Deterministic results
  • No network issues
  • No service outages

✅ Development

  • Test framework features
  • Validate test cases
  • Debug evaluation logic

Next Steps

  • Use Mock Provider to validate your test cases
  • Switch to Gemini Provider for real testing
  • Combine both for baseline comparisons
  • Implement Custom Providers for other LLMs