Skip to main content

Gemini Provider

The Gemini Provider integrates Google's Gemini models into Judge LLM, allowing you to evaluate and compare different Gemini model variants systematically.

Overview

Type: gemini

Purpose: Execute evaluations using Google's Gemini family of models via the official Gemini API.

Key Features:

  • ✅ Multiple Gemini models supported
  • ✅ Cost tracking per request
  • ✅ Token usage monitoring
  • ✅ Multi-turn conversations
  • ✅ Tool calling support
  • ✅ Streaming support
  • ✅ Safety settings configuration

Quick Start

Basic Configuration

providers:
- type: gemini
agent_id: my_gemini_agent
model: gemini-2.0-flash-exp
api_key: ${GOOGLE_API_KEY}

Environment Setup

  1. Get API Key:

  2. Add to .env:

    GOOGLE_API_KEY=your-api-key-here
  3. Run Evaluation:

    judge-llm run --config config.yaml

Configuration Options

Complete Configuration

providers:
- type: gemini
agent_id: gemini_agent

# Model Selection
model: gemini-2.0-flash-exp

# Generation Parameters
temperature: 1.0
max_tokens: 8192
top_p: 0.95
top_k: 40

# Authentication
api_key: ${GOOGLE_API_KEY}

Configuration Reference

OptionTypeDefaultDescription
typestring-Must be gemini
agent_idstring-Unique identifier for this configuration
modelstringgemini-2.0-flash-expGemini model to use
temperaturefloat1.0Sampling temperature (0.0-2.0)
max_tokensint8192Maximum response tokens
top_pfloat0.95Top-p (nucleus) sampling
top_kint40Top-k sampling
api_keystring${GOOGLE_API_KEY}Google API key

Supported Models

Production Models

# Latest experimental model (recommended)
model: gemini-2.0-flash-exp

# Stable flash model (fast & cost-effective)
model: gemini-1.5-flash

# High capability model
model: gemini-1.5-pro

# Ultra-fast, lightweight model
model: gemini-1.5-flash-8b

Model Comparison

ModelSpeedCostCapabilityBest For
gemini-2.0-flash-exp⚡⚡⚡ Fast💰 Low🎯 HighLatest features, experimentation
gemini-1.5-flash⚡⚡⚡ Fast💰 Low🎯 GoodProduction, high throughput
gemini-1.5-pro⚡⚡ Medium💰💰 Medium🎯🎯 ExcellentComplex tasks, high quality
gemini-1.5-flash-8b⚡⚡⚡⚡ Ultra Fast💰 Very Low🎯 BasicSimple tasks, extreme throughput

Generation Parameters

Temperature

Controls randomness in responses:

# Deterministic (good for testing)
temperature: 0.0

# Balanced (default)
temperature: 1.0

# Creative
temperature: 1.5

Recommendations:

  • Use 0.0 for consistent evaluation results
  • Use 0.7-1.0 for balanced responses
  • Use 1.0-2.0 for creative tasks

Max Tokens

Limits response length:

# Short responses
max_tokens: 512

# Default
max_tokens: 8192

# Maximum (model dependent)
max_tokens: 32768 # For pro models

Top-P and Top-K

Control response diversity:

# More focused
top_p: 0.8
top_k: 20

# Default (balanced)
top_p: 0.95
top_k: 40

# More diverse
top_p: 1.0
top_k: 64

Cost Tracking

The Gemini Provider automatically tracks API costs:

Viewing Costs

from judge_llm import evaluate

report = evaluate(config="config.yaml")

# Total cost across all runs
print(f"Total cost: ${report.total_cost:.4f}")

# Per-run costs
for run in report.execution_runs:
cost = run.provider_result.cost
tokens = run.provider_result.token_usage
print(f"{run.eval_case_id}: ${cost:.4f} ({tokens['total_tokens']} tokens)")

Cost Optimization

evaluators:
- type: cost_evaluator
config:
max_cost_per_case: 0.05 # Alert if cost > $0.05 per test

Token Usage

Monitor token consumption:

for run in report.execution_runs:
tokens = run.provider_result.token_usage
print(f"Prompt tokens: {tokens.get('prompt_tokens', 0)}")
print(f"Completion tokens: {tokens.get('completion_tokens', 0)}")
print(f"Total tokens: {tokens.get('total_tokens', 0)}")

Multi-turn Conversations

The Gemini Provider supports multi-turn conversations with context:

# evalset.yaml
eval_cases:
- eval_id: multi_turn_test
conversation:
# Turn 1
- invocation_id: turn_1
user_content:
parts:
- text: "What is the capital of France?"
final_response:
parts:
- text: "The capital of France is Paris."

# Turn 2 (with context from turn 1)
- invocation_id: turn_2
user_content:
parts:
- text: "What is its population?"
final_response:
parts:
- text: "Paris has a population of approximately 2.2 million."

Advanced Configuration

Comparing Different Models

providers:
# Fast model for high volume
- type: gemini
agent_id: gemini_flash
model: gemini-2.0-flash-exp
temperature: 0.0
max_tokens: 512

# High quality for complex tasks
- type: gemini
agent_id: gemini_pro
model: gemini-1.5-pro
temperature: 0.7
max_tokens: 4096

Different Temperatures

providers:
# Deterministic baseline
- type: gemini
agent_id: baseline_temp0
temperature: 0.0

# Creative variant
- type: gemini
agent_id: creative_temp1_5
temperature: 1.5

Error Handling

The provider handles common errors gracefully:

# Errors are captured in provider results
for run in report.execution_runs:
if not run.provider_result.success:
print(f"Error in {run.eval_case_id}:")
print(f" {run.provider_result.error}")

Common errors:

  • API Key Invalid - Check GOOGLE_API_KEY in .env
  • Rate Limit - Add delays between requests
  • Model Not Found - Check model name spelling
  • Quota Exceeded - Check Google Cloud quotas

Performance Tips

1. Use Parallel Execution

agent:
parallel_execution: true
max_workers: 4 # Adjust based on rate limits

2. Choose Appropriate Models

# For simple classification
model: gemini-1.5-flash-8b

# For complex reasoning
model: gemini-1.5-pro

3. Optimize Token Usage

# Limit response length
max_tokens: 512 # Instead of default 8192

# Use lower temperature for consistency
temperature: 0.0

Examples

Example 1: Basic Evaluation

# config.yaml
agent:
num_runs: 1

dataset:
loader: local_file
paths: [./tests.yaml]

providers:
- type: gemini
agent_id: gemini_flash
model: gemini-2.0-flash-exp

evaluators:
- type: response_evaluator
config:
similarity_threshold: 0.85

Example 2: Model Comparison

providers:
# Compare flash vs pro
- type: gemini
agent_id: flash
model: gemini-2.0-flash-exp

- type: gemini
agent_id: pro
model: gemini-1.5-pro

# Both will run on the same test cases

Example 3: Temperature Tuning

providers:
- type: gemini
agent_id: temp_0_0
temperature: 0.0

- type: gemini
agent_id: temp_0_7
temperature: 0.7

- type: gemini
agent_id: temp_1_5
temperature: 1.5

Troubleshooting

API Key Not Found

# Check .env file exists
ls -la .env

# Verify GOOGLE_API_KEY is set
echo $GOOGLE_API_KEY

# Load .env manually if needed
export $(cat .env | xargs)

Rate Limiting

# Reduce parallel workers
agent:
parallel_execution: true
max_workers: 2 # Reduced from 4

High Costs

# Add cost evaluator
evaluators:
- type: cost_evaluator
config:
max_cost_per_case: 0.01 # Alert on expensive tests

API Reference

See the Gemini API documentation for:

  • Model capabilities
  • Pricing information
  • Rate limits
  • API updates

Next Steps