Skip to main content

Console Reporter

Real-time terminal output with rich formatting and colored status indicators for monitoring evaluation progress.

Overview

The Console Reporter displays evaluation results directly in your terminal using rich formatting, tables, and color-coded status. Perfect for development, debugging, and real-time monitoring.

Key Features:

  • Live progress updates during execution
  • Colored pass/fail indicators (green ✓, red ✗)
  • Summary statistics (success rate, costs, latency)
  • Detailed test case breakdowns
  • Per-evaluator results with scores
  • Directory-based grouping when using the directory loader (results organized by source folder)

Configuration

Basic Usage

reporters:
- type: console

That's it! No configuration needed.

CLI Usage

# Console is the default reporter
judge-llm run --config test.yaml

# Explicitly specify console
judge-llm run --config test.yaml --report console

Python API

from judge_llm import evaluate

report = evaluate(
dataset={"loader": "local_file", "paths": ["./tests.json"]},
providers=[{"type": "gemini", "agent_id": "test"}],
evaluators=[{"type": "response_evaluator"}],
reporters=[{"type": "console"}]
)

Output Format

Execution Summary

================================================================================
EVALUATION CONFIGURATION
================================================================================
┌──────────────────────────────────────────────────────────────────────────┐
│ Agent ID / Provider │ Type │ Runs │ Parallel │ Datasets │ ... │
├──────────────────────────────────────────────────────────────────────────┤
│ my_agent │ gemini │ 1 │ No │ tests.json │ ... │
└──────────────────────────────────────────────────────────────────────────┘
================================================================================

▶ Starting evaluation...

Test Results

┏━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━┓
┃ Eval ID ┃ Provider ┃ Status ┃ Cost ┃ Latency ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━┩
│ test_001 │ gemini │ ✓ PASS │ $0.001 │ 1.23s │
│ test_002 │ gemini │ ✗ FAIL │ $0.002 │ 2.45s │
│ test_003 │ gemini │ ✓ PASS │ $0.001 │ 1.56s │
└────────────┴──────────┴─────────┴────────┴───────────┘

Overall: 2/3 passed (66.7%)
Total Cost: $0.004
Total Time: 5.24s

Per-Evaluator Results

Evaluator: ResponseEvaluator
test_001: ✓ PASS (score: 0.92 / threshold: 0.80)
test_002: ✗ FAIL (score: 0.65 / threshold: 0.80)
test_003: ✓ PASS (score: 0.88 / threshold: 0.80)

Evaluator: CostEvaluator
test_001: ✓ PASS ($0.001 / max: $0.010)
test_002: ✓ PASS ($0.002 / max: $0.010)
test_003: ✓ PASS ($0.001 / max: $0.010)

Use Cases

Development & Debugging

# Quick feedback loop
judge-llm run --config dev.yaml

# Watch progress in real-time
# See failures immediately
# Debug test cases interactively

CI/CD Logs

# .github/workflows/eval.yml
- name: Run LLM Evaluations
run: judge-llm run --config ci.yaml --report console

Console output appears in GitHub Actions logs with proper formatting.

Local Testing

# Quick validation before committing
from judge_llm import evaluate

report = evaluate(
config="./my-test.yaml",
reporters=[{"type": "console"}]
)

if not report.overall_success:
print("❌ Tests failed! Fix before committing.")
exit(1)

Features

Color Coding

  • ✓ Green - Test passed
  • ✗ Red - Test failed
  • Yellow - Warnings
  • Cyan - Info/headers
  • Dim - Secondary information

Progress Indicators

Real-time progress as tests execute:

▶ Starting evaluation...
[1/10] test_001 ✓
[2/10] test_002 ✗
[3/10] test_003 ✓
...

Directory Grouping

When using the directory loader, execution results are automatically grouped by source directory:

                    Execution Details — basic (2/2 passed)
┏━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Exec ID ┃ Eval Case ┃ Source ┃ Run ┃ Provide ┃ Status ┃ Time ┃ Evaluators ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━╇━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━┩
│ a1b2c3d4 │ greetings │ greetings.json │ 1 │ gemini │ ✓ │ 1.23 │ 2/2 │
│ e5f6g7h8 │ math │ math.json │ 1 │ gemini │ ✓ │ 0.98 │ 2/2 │
└───────────┴────────────┴────────────────┴─────┴─────────┴────────┴─────────┴────────────┘

Execution Details — advanced (1/2 passed)
┏━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┓
...

Each source directory gets its own table with a title showing the directory name and pass/fail count.

Summary Statistics

Quick overview at the end:

  • Success rate percentage
  • Total cost across all tests
  • Total execution time
  • Pass/fail counts per evaluator

Best Practices

1. Console + File Reporter

Use console for monitoring, file for archiving:

reporters:
- type: console # Watch progress
- type: html # Save results
output_path: ./report.html

2. Adjust Log Level

Control verbosity:

agent:
log_level: INFO # Standard output
# log_level: DEBUG # Verbose (shows all details)
# log_level: WARNING # Quiet (only warnings/errors)

reporters:
- type: console

3. Redirect Output

Capture console output to file:

# Save output to file
judge-llm run --config test.yaml > results.txt 2>&1

# Show on screen and save to file
judge-llm run --config test.yaml | tee results.txt

Combining with Other Reporters

Pattern 1: Dev Workflow

reporters:
- type: console # Immediate feedback
- type: html # Detailed analysis
output_path: ./dev-report.html

Pattern 2: CI Pipeline

reporters:
- type: console # CI logs
- type: json # Artifact storage
output_path: ./results.json

Pattern 3: Production Monitoring

reporters:
- type: console # CloudWatch/logs
- type: database # Historical data
db_path: ./prod-evals.db

Troubleshooting

Colors Not Showing

Issue: Output shows escape codes instead of colors

Cause: Terminal doesn't support ANSI colors

Solutions:

# Force color output
FORCE_COLOR=1 judge-llm run --config test.yaml

# Disable colors
NO_COLOR=1 judge-llm run --config test.yaml

Output Cut Off

Issue: Long output truncated

Cause: Terminal buffer limit

Solutions:

  • Increase terminal buffer size
  • Redirect to file: judge-llm run --config test.yaml > output.txt
  • Use HTML reporter for full details

Unicode Errors

Issue: UnicodeEncodeError in console output

Cause: Terminal encoding doesn't support unicode

Solutions:

# Set UTF-8 encoding
export PYTHONIOENCODING=utf-8
judge-llm run --config test.yaml

API Reference

For implementation details, see ConsoleReporter API.