How to Benchmark Search Performance¶
This guide shows how to measure and analyze search performance in the Juddges App using the automated benchmark tool.
Prerequisites¶
- Backend API running (development port 8004 or production port 8002)
- Valid API key configured in environment or passed as argument
- Docker (recommended) or Python 3.12+ with required dependencies
Quick Start¶
Using Docker (Recommended)¶
# Run with default queries and settings
scripts/run_benchmark.sh
# Run with custom backend URL
scripts/run_benchmark.sh --backend-url http://localhost:8002
# Run with more iterations for precise results
scripts/run_benchmark.sh --iterations 5 --output data/benchmark_results.json
Using Python Directly¶
# Install dependencies (only needed once)
pip install -r scripts/requirements.txt
# Run benchmark with default settings
python scripts/benchmark_search.py
# Run with custom configuration
python scripts/benchmark_search.py --backend-url http://localhost:8002 \
--iterations 5 --output data/benchmark_results.json
Configuration Options¶
| Option | Default | Description |
|---|---|---|
--backend-url |
http://localhost:8004 |
Backend API base URL |
--api-key |
$BACKEND_API_KEY |
API authentication key |
--iterations |
3 |
Number of test runs per query |
--warmup |
2 |
Warmup queries to prime cache |
--queries-file |
Built-in queries | Custom JSON query file |
--output |
Console only | Save detailed JSON results |
Custom Queries¶
Create a JSON file with custom test queries:
Detailed Format¶
[
{
"query": "umowa o pracę rozwiązanie",
"category": "labor",
"language": "pl"
},
{
"query": "contract termination notice",
"category": "labor",
"language": "en"
}
]
Simple Format¶
Then run:
Understanding Results¶
Performance Metrics¶
- P50: Median response time (50% of requests faster)
- P95: 95th percentile (95% of requests faster)
- P99: 99th percentile (99% of requests faster)
- Results: Average number of documents returned
Search Types Tested¶
- Hybrid (thinking mode, α=0.5): AI-enhanced hybrid search
- Keyword (rabbit mode, α=0.0): Pure BM25/text search
- Vector (rabbit mode, α=1.0): Pure semantic/embedding search
Performance Targets¶
| Search Type | P95 Target | Purpose |
|---|---|---|
| Keyword | < 150ms | Fast text matching |
| Vector | < 200ms | Semantic similarity |
| Hybrid | < 300ms | Best relevance with AI enhancement |
Sample Output¶
Search Performance Benchmark Results
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━━┓
┃ Query ┃ Type ┃ P50 ┃ P95 ┃ P99 ┃ Results ┃ Status ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━━┩
│ kredyty frankowe │ hybrid │ 120ms │ 145ms │ 160ms │ 10.0 │ ✓ PASS │
│ kredyty frankowe │ keyword │ 85ms │ 95ms │ 110ms │ 10.0 │ ✓ PASS │
│ kredyty frankowe │ vector │ 105ms │ 125ms │ 140ms │ 10.0 │ ✓ PASS │
└────────────────────────────┴──────────┴───────┴───────┴───────┴─────────┴────────┘
Performance Summary
╭─ HYBRID (300ms target): P95=145ms, 8/8 queries passed - PASS
├─ KEYWORD (150ms target): P95=95ms, 8/8 queries passed - PASS
├─ VECTOR (200ms target): P95=125ms, 8/8 queries passed - PASS
│
└─ ✓ BENCHMARK PASSED
Troubleshooting¶
Backend Connection Issues¶
# Check if backend is running
curl -f http://localhost:8004/health
# Check Docker containers
docker compose ps
# Start development environment
docker compose -f docker-compose.dev.yml up backend
Authentication Errors¶
# Check API key is set
echo $BACKEND_API_KEY
# Test API key manually
curl -H "X-API-Key: $BACKEND_API_KEY" http://localhost:8004/health/status
Performance Issues¶
- Slow queries: Check database indexes and vector search configuration
- High latency: Verify backend resources and embedding provider response times
- Inconsistent results: Increase
--iterationsfor more stable measurements
Custom Query Validation¶
# Validate JSON format
python -m json.tool custom_queries.json
# Test single query
python scripts/benchmark_search.py \
--queries-file custom_queries.json \
--iterations 1 \
--backend-url http://localhost:8004
Best Practices¶
- Baseline establishment: Run benchmarks on a known-good system first
- Environment consistency: Use the same backend configuration for comparable results
- Load isolation: Run benchmarks when backend is not under other load
- Multiple iterations: Use
--iterations 5or higher for production validation - Result archival: Save results with
--outputfor trend analysis - Query diversity: Include both Polish and English queries representing real usage
Production Benchmarking¶
For production environment testing:
# Connect to production backend
scripts/run_benchmark.sh \
--backend-url https://your-production-api.com \
--api-key $PROD_API_KEY \
--iterations 5 \
--output data/prod_benchmark_$(date +%Y%m%d_%H%M%S).json
Monitor results over time to detect performance regressions or improvements.