Console Reporter

Real-time terminal output with rich formatting and colored status indicators for monitoring evaluation progress.

Overview

The Console Reporter displays evaluation results directly in your terminal using rich formatting, tables, and color-coded status. Perfect for development, debugging, and real-time monitoring.

Key Features:

Live progress updates during execution
Colored pass/fail indicators (green ✓, red ✗)
Summary statistics (success rate, costs, latency)
Detailed test case breakdowns
Per-evaluator results with scores
Directory-based grouping when using the directory loader (results organized by source folder)

Configuration

Basic Usage

reporters:
  - type: console

That's it! No configuration needed.

CLI Usage

# Console is the default reporter
judge-llm run --config test.yaml

# Explicitly specify console
judge-llm run --config test.yaml --report console

Python API

from judge_llm import evaluate

report = evaluate(
    dataset={"loader": "local_file", "paths": ["./tests.json"]},
    providers=[{"type": "gemini", "agent_id": "test"}],
    evaluators=[{"type": "response_evaluator"}],
    reporters=[{"type": "console"}]
)

Output Format

Execution Summary

================================================================================
  EVALUATION CONFIGURATION
================================================================================
┌──────────────────────────────────────────────────────────────────────────┐
│ Agent ID / Provider    │ Type    │ Runs │ Parallel │ Datasets   │ ...   │
├──────────────────────────────────────────────────────────────────────────┤
│ my_agent              │ gemini  │  1   │ No       │ tests.json │ ...   │
└──────────────────────────────────────────────────────────────────────────┘
================================================================================

▶ Starting evaluation...

Test Results

┏━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━┓
┃ Eval ID    ┃ Provider ┃ Status  ┃ Cost   ┃ Latency   ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━┩
│ test_001   │ gemini   │ ✓ PASS  │ $0.001 │ 1.23s     │
│ test_002   │ gemini   │ ✗ FAIL  │ $0.002 │ 2.45s     │
│ test_003   │ gemini   │ ✓ PASS  │ $0.001 │ 1.56s     │
└────────────┴──────────┴─────────┴────────┴───────────┘

Overall: 2/3 passed (66.7%)
Total Cost: $0.004
Total Time: 5.24s

Per-Evaluator Results

Evaluator: ResponseEvaluator
  test_001: ✓ PASS (score: 0.92 / threshold: 0.80)
  test_002: ✗ FAIL (score: 0.65 / threshold: 0.80)
  test_003: ✓ PASS (score: 0.88 / threshold: 0.80)

Evaluator: CostEvaluator
  test_001: ✓ PASS ($0.001 / max: $0.010)
  test_002: ✓ PASS ($0.002 / max: $0.010)
  test_003: ✓ PASS ($0.001 / max: $0.010)

Use Cases

Development & Debugging

# Quick feedback loop
judge-llm run --config dev.yaml

# Watch progress in real-time
# See failures immediately
# Debug test cases interactively

CI/CD Logs

# .github/workflows/eval.yml
- name: Run LLM Evaluations
  run: judge-llm run --config ci.yaml --report console

Console output appears in GitHub Actions logs with proper formatting.

Local Testing

# Quick validation before committing
from judge_llm import evaluate

report = evaluate(
    config="./my-test.yaml",
    reporters=[{"type": "console"}]
)

if not report.overall_success:
    print("❌ Tests failed! Fix before committing.")
    exit(1)

Features

Color Coding

✓ Green - Test passed
✗ Red - Test failed
Yellow - Warnings
Cyan - Info/headers
Dim - Secondary information

Progress Indicators

Real-time progress as tests execute:

▶ Starting evaluation...
  [1/10] test_001 ✓
  [2/10] test_002 ✗
  [3/10] test_003 ✓
  ...

Directory Grouping

When using the directory loader, execution results are automatically grouped by source directory:

                    Execution Details — basic (2/2 passed)
┏━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Exec ID   ┃ Eval Case  ┃ Source         ┃ Run ┃ Provide ┃ Status ┃ Time    ┃ Evaluators ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━╇━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━┩
│ a1b2c3d4  │ greetings  │ greetings.json │  1  │ gemini  │ ✓      │ 1.23   │ 2/2        │
│ e5f6g7h8  │ math       │ math.json      │  1  │ gemini  │ ✓      │ 0.98   │ 2/2        │
└───────────┴────────────┴────────────────┴─────┴─────────┴────────┴─────────┴────────────┘

                  Execution Details — advanced (1/2 passed)
┏━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┓
...

Each source directory gets its own table with a title showing the directory name and pass/fail count.

Summary Statistics

Quick overview at the end:

Success rate percentage
Total cost across all tests
Total execution time
Pass/fail counts per evaluator

Best Practices

1. Console + File Reporter

Use console for monitoring, file for archiving:

reporters:
  - type: console    # Watch progress
  - type: html       # Save results
    output_path: ./report.html

2. Adjust Log Level

Control verbosity:

agent:
  log_level: INFO    # Standard output
  # log_level: DEBUG # Verbose (shows all details)
  # log_level: WARNING # Quiet (only warnings/errors)

reporters:
  - type: console

3. Redirect Output

Capture console output to file:

# Save output to file
judge-llm run --config test.yaml > results.txt 2>&1

# Show on screen and save to file
judge-llm run --config test.yaml | tee results.txt

Combining with Other Reporters

Pattern 1: Dev Workflow

reporters:
  - type: console              # Immediate feedback
  - type: html                 # Detailed analysis
    output_path: ./dev-report.html

Pattern 2: CI Pipeline

reporters:
  - type: console              # CI logs
  - type: json                 # Artifact storage
    output_path: ./results.json

Pattern 3: Production Monitoring

reporters:
  - type: console              # CloudWatch/logs
  - type: database             # Historical data
    db_path: ./prod-evals.db

Troubleshooting

Colors Not Showing

Issue: Output shows escape codes instead of colors

Cause: Terminal doesn't support ANSI colors

Solutions:

# Force color output
FORCE_COLOR=1 judge-llm run --config test.yaml

# Disable colors
NO_COLOR=1 judge-llm run --config test.yaml

Output Cut Off

Issue: Long output truncated

Cause: Terminal buffer limit

Solutions:

Increase terminal buffer size
Redirect to file: judge-llm run --config test.yaml > output.txt
Use HTML reporter for full details

Unicode Errors

Issue: UnicodeEncodeError in console output

Cause: Terminal encoding doesn't support unicode

Solutions:

# Set UTF-8 encoding
export PYTHONIOENCODING=utf-8
judge-llm run --config test.yaml

API Reference

For implementation details, see ConsoleReporter API.

Overview​

Configuration​

Basic Usage​

CLI Usage​

Python API​

Output Format​

Execution Summary​

Test Results​

Per-Evaluator Results​

Use Cases​

Development & Debugging​

CI/CD Logs​

Local Testing​

Features​

Color Coding​

Progress Indicators​

Directory Grouping​

Summary Statistics​

Best Practices​

1. Console + File Reporter​

2. Adjust Log Level​

3. Redirect Output​

Combining with Other Reporters​

Pattern 1: Dev Workflow​

Pattern 2: CI Pipeline​

Pattern 3: Production Monitoring​

Troubleshooting​

Colors Not Showing​

Output Cut Off​

Unicode Errors​

Related Documentation​

API Reference​

Overview

Configuration

Basic Usage

CLI Usage

Python API

Output Format

Execution Summary

Test Results

Per-Evaluator Results

Use Cases

Development & Debugging

CI/CD Logs

Local Testing

Features

Color Coding

Progress Indicators

Directory Grouping

Summary Statistics

Best Practices

1. Console + File Reporter

2. Adjust Log Level

3. Redirect Output

Combining with Other Reporters

Pattern 1: Dev Workflow

Pattern 2: CI Pipeline

Pattern 3: Production Monitoring

Troubleshooting

Colors Not Showing

Output Cut Off

Unicode Errors

Related Documentation

API Reference