How to Benchmark Search Performance¶

This guide shows how to measure and analyze search performance in the Juddges App using the automated benchmark tool.

Prerequisites¶

Backend API running (development port 8004 or production port 8002)
Valid API key configured in environment or passed as argument
Docker (recommended) or Python 3.12+ with required dependencies

Quick Start¶

Using Docker (Recommended)¶

# Run with default queries and settings
scripts/run_benchmark.sh

# Run with custom backend URL
scripts/run_benchmark.sh --backend-url http://localhost:8002

# Run with more iterations for precise results
scripts/run_benchmark.sh --iterations 5 --output data/benchmark_results.json

Using Python Directly¶

# Install dependencies (only needed once)
pip install -r scripts/requirements.txt

# Run benchmark with default settings
python scripts/benchmark_search.py

# Run with custom configuration
python scripts/benchmark_search.py --backend-url http://localhost:8002 \
  --iterations 5 --output data/benchmark_results.json

Configuration Options¶

Option	Default	Description
`--backend-url`	`http://localhost:8004`	Backend API base URL
`--api-key`	`$BACKEND_API_KEY`	API authentication key
`--iterations`	`3`	Number of test runs per query
`--warmup`	`2`	Warmup queries to prime cache
`--queries-file`	Built-in queries	Custom JSON query file
`--output`	Console only	Save detailed JSON results

Custom Queries¶

Create a JSON file with custom test queries:

Detailed Format¶

[
  {
    "query": "umowa o pracę rozwiązanie",
    "category": "labor",
    "language": "pl"
  },
  {
    "query": "contract termination notice",
    "category": "labor",
    "language": "en"
  }
]

Simple Format¶

["contract law", "prawo umów", "podatek dochodowy"]

Then run:

python scripts/benchmark_search.py --queries-file custom_queries.json

Understanding Results¶

Performance Metrics¶

P50: Median response time (50% of requests faster)
P95: 95th percentile (95% of requests faster)
P99: 99th percentile (99% of requests faster)
Results: Average number of documents returned

Search Types Tested¶

Hybrid (thinking mode, α=0.5): AI-enhanced hybrid search
Keyword (rabbit mode, α=0.0): Pure BM25/text search
Vector (rabbit mode, α=1.0): Pure semantic/embedding search

Performance Targets¶

Search Type	P95 Target	Purpose
Keyword	< 150ms	Fast text matching
Vector	< 200ms	Semantic similarity
Hybrid	< 300ms	Best relevance with AI enhancement

Sample Output¶

                Search Performance Benchmark Results
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━━┓
┃ Query                      ┃ Type     ┃   P50 ┃   P95 ┃   P99 ┃ Results ┃ Status ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━━┩
│ kredyty frankowe           │ hybrid   │ 120ms │ 145ms │ 160ms │    10.0 │ ✓ PASS │
│ kredyty frankowe           │ keyword  │  85ms │  95ms │ 110ms │    10.0 │ ✓ PASS │
│ kredyty frankowe           │ vector   │ 105ms │ 125ms │ 140ms │    10.0 │ ✓ PASS │
└────────────────────────────┴──────────┴───────┴───────┴───────┴─────────┴────────┘

Performance Summary
╭─ HYBRID (300ms target): P95=145ms, 8/8 queries passed - PASS
├─ KEYWORD (150ms target): P95=95ms, 8/8 queries passed - PASS
├─ VECTOR (200ms target): P95=125ms, 8/8 queries passed - PASS
│
└─ ✓ BENCHMARK PASSED

Troubleshooting¶

Backend Connection Issues¶

# Check if backend is running
curl -f http://localhost:8004/health

# Check Docker containers
docker compose ps

# Start development environment
docker compose -f docker-compose.dev.yml up backend

Authentication Errors¶

# Check API key is set
echo $BACKEND_API_KEY

# Test API key manually
curl -H "X-API-Key: $BACKEND_API_KEY" http://localhost:8004/health/status

Performance Issues¶

Slow queries: Check database indexes and vector search configuration
High latency: Verify backend resources and embedding provider response times
Inconsistent results: Increase --iterations for more stable measurements

Custom Query Validation¶

# Validate JSON format
python -m json.tool custom_queries.json

# Test single query
python scripts/benchmark_search.py \
  --queries-file custom_queries.json \
  --iterations 1 \
  --backend-url http://localhost:8004

Best Practices¶

Baseline establishment: Run benchmarks on a known-good system first
Environment consistency: Use the same backend configuration for comparable results
Load isolation: Run benchmarks when backend is not under other load
Multiple iterations: Use --iterations 5 or higher for production validation
Result archival: Save results with --output for trend analysis
Query diversity: Include both Polish and English queries representing real usage

Production Benchmarking¶

For production environment testing:

# Connect to production backend
scripts/run_benchmark.sh \
  --backend-url https://your-production-api.com \
  --api-key $PROD_API_KEY \
  --iterations 5 \
  --output data/prod_benchmark_$(date +%Y%m%d_%H%M%S).json

Monitor results over time to detect performance regressions or improvements.