Skip to main content

CLI Reference

Complete command-line interface reference for Judge LLM.

Installation

pip install judge-llm

Verify installation:

judge-llm --version

Commands

run

Execute evaluations from a configuration file.

judge-llm run --config <path> [options]

Arguments:

ArgumentShortDescriptionRequired
--config-cPath to YAML configuration fileYes
--report-rReporter type(s) to useNo
--output-oOutput path for reportNo
--db-pathDatabase path (for database reporter)No
--no-validateSkip configuration validationNo
--telemetry-tEnable OpenTelemetry tracingNo
--telemetry-exporterExporter type: console, otlp, phoenix (default: console)No

Examples:

Basic usage:

judge-llm run --config test.yaml

With specific reporter:

judge-llm run --config test.yaml --report json --output results.json

Multiple reporters:

judge-llm run --config test.yaml \
--report console \
--report html --output report.html \
--report database --db-path results.db

Skip validation (faster):

judge-llm run --config test.yaml --no-validate

With telemetry (console output):

judge-llm run --config test.yaml --telemetry

With telemetry (OTLP exporter):

judge-llm run --config test.yaml --telemetry --telemetry-exporter otlp

With telemetry (Arize Phoenix):

judge-llm run --config test.yaml --telemetry --telemetry-exporter phoenix
note

Telemetry requires optional dependencies. Install with pip install judge-llm[telemetry] or pip install judge-llm[phoenix].

list

List available providers, evaluators, or reporters.

judge-llm list <entity>

Arguments:

ArgumentDescriptionValues
entityComponent type to listproviders, evaluators, reporters

Examples:

List providers:

judge-llm list providers

Output:

Available Providers:
- anthropic
- gemini
- openai

List evaluators:

judge-llm list evaluators

Output:

Available Evaluators:
- response_evaluator
- trajectory_evaluator
- cost_evaluator
- latency_evaluator

List reporters:

judge-llm list reporters

Output:

Available Reporters:
- console
- database
- html
- json

validate

Validate a configuration file without running evaluations.

judge-llm validate --config <path>

Arguments:

ArgumentShortDescriptionRequired
--config-cPath to YAML configuration fileYes

Examples:

judge-llm validate --config test.yaml

Output if valid:

✓ Configuration is valid

Output if invalid:

✗ Configuration validation failed:
- Missing required field: dataset.loader
- Invalid provider type: invalid_provider

Global Options

These options work with all commands:

OptionShortDescription
--help-hShow help message
--version-vShow version number
--verboseEnable verbose logging
--quiet-qSuppress all output except errors

Examples:

# Show help
judge-llm --help
judge-llm run --help

# Show version
judge-llm --version

# Verbose output
judge-llm run --config test.yaml --verbose

# Quiet mode
judge-llm run --config test.yaml --quiet

Configuration File

The --config argument accepts YAML configuration files.

Basic Structure

dataset:
loader: local_file
paths:
- ./tests.json

providers:
- type: gemini
agent_id: test_agent

evaluators:
- type: response_evaluator

reporters:
- type: console

Environment Variables

Use ${VAR_NAME} syntax to reference environment variables:

providers:
- type: gemini
agent_id: ${AGENT_ID}
api_key: ${GEMINI_API_KEY}

reporters:
- type: database
db_path: ${DB_PATH:-./results.db} # Default value

Load from .env file:

# .env
AGENT_ID=my_agent
GEMINI_API_KEY=your_api_key
DB_PATH=./prod_results.db
judge-llm run --config test.yaml

Default Configuration

Create .judge_llm.defaults.yaml in your project root or ~/.judge_llm/defaults.yaml for global defaults:

# .judge_llm.defaults.yaml
providers:
- type: gemini
model: gemini-2.0-flash-exp
temperature: 0.0

reporters:
- type: console
- type: json
output_path: ./results/latest.json

These settings are merged with your test config.

Reporter Options

Console Reporter

judge-llm run --config test.yaml --report console

No additional options needed.

JSON Reporter

judge-llm run --config test.yaml \
--report json \
--output ./results.json

Options:

  • --output: Path to JSON file (required)

HTML Reporter

judge-llm run --config test.yaml \
--report html \
--output ./report.html

Options:

  • --output: Path to HTML file (required)

Database Reporter

judge-llm run --config test.yaml \
--report database \
--db-path ./results.db

Options:

  • --db-path: Path to SQLite database (required)

Multiple Reporters

Combine multiple reporters in a single run:

judge-llm run --config test.yaml \
--report console \
--report json --output results.json \
--report html --output report.html \
--report database --db-path results.db

Exit Codes

CodeMeaning
0Success - all tests passed
1Failure - some tests failed or error occurred
2Invalid configuration
3Missing required arguments

Example Usage in Scripts:

#!/bin/bash

judge-llm run --config test.yaml
EXIT_CODE=$?

if [ $EXIT_CODE -eq 0 ]; then
echo "All tests passed!"
# Deploy or continue
elif [ $EXIT_CODE -eq 1 ]; then
echo "Tests failed!"
exit 1
else
echo "Configuration error!"
exit 2
fi

CI/CD Integration

GitHub Actions

name: Evaluate LLM

on: [push, pull_request]

jobs:
evaluate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'

- name: Install dependencies
run: |
pip install judge-llm

- name: Run evaluations
env:
GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
run: |
judge-llm run --config test.yaml \
--report console \
--report html --output report.html

- name: Upload report
if: always()
uses: actions/upload-artifact@v3
with:
name: evaluation-report
path: report.html

- name: Fail if tests failed
run: |
if [ $? -ne 0 ]; then
echo "Evaluations failed"
exit 1
fi

GitLab CI

evaluate:
image: python:3.10
script:
- pip install judge-llm
- judge-llm run --config test.yaml --report json --output results.json
artifacts:
paths:
- results.json
when: always
only:
- main
- merge_requests

Jenkins

pipeline {
agent any

environment {
GEMINI_API_KEY = credentials('gemini-api-key')
}

stages {
stage('Install') {
steps {
sh 'pip install judge-llm'
}
}

stage('Evaluate') {
steps {
sh '''
judge-llm run --config test.yaml \
--report console \
--report html --output report.html
'''
}
}
}

post {
always {
archiveArtifacts artifacts: 'report.html', allowEmptyArchive: true
}
}
}

Advanced Usage

Custom Reporters via CLI

While you can't register custom reporters directly via CLI, you can specify them in your config:

# test.yaml
reporters:
- type: custom
module_path: ./reporters/slack_reporter.py
class_name: SlackReporter
webhook_url: ${SLACK_WEBHOOK_URL}
judge-llm run --config test.yaml

Combining with Other Tools

With jq for JSON processing:

judge-llm run --config test.yaml --report json --output results.json

# Extract success rate
jq '.success_rate' results.json

# Filter failed tests
jq '.test_cases[] | select(.passed == false)' results.json

With sqlite3 for database queries:

judge-llm run --config test.yaml --report database --db-path results.db

# Query results
sqlite3 results.db "SELECT eval_id, passed, cost FROM test_cases ORDER BY cost DESC LIMIT 10"

Batch Processing

Run multiple configurations:

#!/bin/bash

for config in configs/*.yaml; do
echo "Running $config..."
judge-llm run --config "$config" \
--report json \
--output "results/$(basename $config .yaml).json"
done

Parallel Execution

# Run multiple configs in parallel
for config in configs/*.yaml; do
judge-llm run --config "$config" &
done
wait

echo "All evaluations complete"

Troubleshooting

Command Not Found

Issue: judge-llm: command not found

Solutions:

  • Ensure installed: pip install judge-llm
  • Check PATH: which judge-llm
  • Use full path: python -m judge_llm.cli run --config test.yaml

Configuration Not Found

Issue: Configuration file not found: test.yaml

Solutions:

  • Use absolute path: judge-llm run --config /full/path/to/test.yaml
  • Check current directory: ls test.yaml

API Key Not Found

Issue: API key not found for provider: gemini

Solutions:

  • Set environment variable: export GEMINI_API_KEY=your_key
  • Use .env file in project root
  • Specify in config (not recommended for production)

Invalid Configuration

Issue: Configuration validation errors

Solution: Use validate command first:

judge-llm validate --config test.yaml

Fix reported issues, then run again.