Juddges App - Project Summary¶
🎯 Project Overview¶
Juddges App is a fork of the JuDDGES platform, specialized for searching and analyzing judicial decisions from Poland and the United Kingdom. The app provides semantic search capabilities across 6,000+ sampled court judgments using modern AI and vector database technology.
📊 Key Features¶
- Multi-Jurisdiction Support: Polish and UK court decisions in one platform
- Semantic Search: Vector-based similarity search using OpenAI embeddings
- Full-Text Search: PostgreSQL-powered text search with ranking
- Structured Data: Comprehensive metadata including judges, dates, keywords
- Modern Stack: Next.js 15, FastAPI, Supabase, pgvector
- Scalable Architecture: Ready to scale from 6,000 to 200,000+ judgments
🗂️ Repository Structure¶
juddges-app/
├── README.md # Main project documentation
├── SETUP_GUIDE.md # Step-by-step setup instructions
├── DATA_INGESTION_GUIDE.md # Data ingestion documentation
├── SUPABASE_MCP_GUIDE.md # Supabase MCP tools reference
├── PROJECT_SUMMARY.md # This file
├── .env.example # Environment variable template
├── .gitignore # Git ignore rules
│
├── supabase/ # Supabase database configuration
│ └── migrations/
│ └── 20260209000001_create_judgments_table.sql
│
└── scripts/ # Data ingestion scripts
├── ingest_judgments.py # Main ingestion pipeline
└── requirements.txt # Python dependencies
📚 Documentation Files¶
1. README.md¶
Purpose: Main project documentation and quick start guide
Contents: - Project overview and features - Technology stack details - Quick start instructions - API endpoint documentation - Development commands
Target Audience: Developers getting started with the project
2. SETUP_GUIDE.md¶
Purpose: Complete step-by-step setup instructions
Contents: - Prerequisites checklist - Supabase project creation - Environment configuration - Database migration steps - Data ingestion walkthrough - Troubleshooting guide
Target Audience: First-time users setting up the project
Key Sections: - ✅ 5-minute quick start - 🔧 Detailed setup steps - 🐛 Troubleshooting - 📝 Setup completion checklist
3. DATA_INGESTION_GUIDE.md¶
Purpose: Comprehensive guide to data ingestion process
Contents: - Data source descriptions (HuggingFace datasets) - Ingestion pipeline architecture - Performance metrics and costs - Data quality checks - Advanced usage patterns
Target Audience: Developers managing data ingestion
Key Sections: - 📊 Dataset statistics and structure - 🔄 Pipeline architecture - 📈 Performance benchmarks - 💰 Cost estimates (OpenAI API) - 🛠️ Troubleshooting common issues
4. SUPABASE_MCP_GUIDE.md¶
Purpose: Guide for using Supabase MCP tools in Claude Code
Contents: - Available MCP tools reference - Common query examples - Security and access control - Performance monitoring - Development workflow (branches)
Target Audience: Developers using Claude Code with Supabase
Key Sections: - 🔧 MCP tool catalog - 📝 SQL query examples - 🔐 Security configuration - 📊 Monitoring and analytics - 🚀 Advanced queries
5. .env.example¶
Purpose: Environment variable template
Contents: - Supabase credentials - OpenAI API keys - Backend configuration - Feature flags - Service URLs
Usage: Copy to .env and fill in actual values
🗄️ Database Schema¶
Main Table: judgments¶
Purpose: Store all court judgments from Poland and UK
Key Columns:
- id: UUID primary key
- case_number: Unique case identifier
- jurisdiction: 'PL' or 'UK'
- court_name: Name of the court
- decision_date: Date of judgment
- title: Case title
- summary: Brief summary
- full_text: Complete judgment text
- judges: JSONB array of judges
- keywords: Text array for filtering
- embedding: vector(768) for semantic search
- metadata: JSONB for flexible data
Indexes: - B-tree: jurisdiction, decision_date, case_number - GIN: full-text search, keywords, JSONB - HNSW: vector similarity (cosine distance)
Search Functions:
1. search_judgments_by_embedding(): Semantic search with vectors
2. search_judgments_by_text(): Full-text search with ranking
📦 Data Sources¶
Polish Judgments¶
Dataset: JuDDGES/pl-appealcourt-criminal
DOI: 10.57967/hf/8772
Description: Multi-jurisdiction case law dataset with standardized format
Polish Coverage: - ~50,000+ Polish court decisions - Multiple court levels (Supreme, Appeal, District) - Various case types (Civil, Criminal, Administrative)
Ingestion Command:
UK Judgments¶
Dataset: JuDDGES/en-appealcourt
DOI: 10.57967/hf/8773
Description: Complete collection of England & Wales Court of Appeal (Criminal Division) judgments
Coverage: - 6,154 judgments total - Date range: Up to May 15, 2024 - Court: Court of Appeal (Criminal Division) - Format: Structured XML with metadata
Ingestion Command:
🔄 Data Ingestion Pipeline¶
Pipeline Flow¶
1. Download from HuggingFace
↓
2. Transform to unified schema
↓
3. Generate embeddings (OpenAI)
↓
4. Insert into Supabase
↓
5. Verify data quality
Performance Metrics¶
| Operation | Time (100 cases) | Time (3000 cases) | Cost (3000) |
|---|---|---|---|
| Download | ~30 seconds | ~5 minutes | Free |
| Transform | ~1 minute | ~15 minutes | Free |
| Embeddings | ~5-7 minutes | ~2-3 hours | ~$12 |
| Insert | ~1 minute | ~10 minutes | Free |
| Total | ~8-10 minutes | ~3-4 hours | ~$12 |
Storage Requirements¶
- 100 judgments: ~2-3 MB
- 1,000 judgments: ~20-30 MB
- 6,000 judgments: ~120-180 MB (current target)
- 10,000 judgments: ~200-300 MB
🚀 Next Steps¶
Phase 1: Initial Setup ✅¶
- Create repository structure
- Write documentation
- Create database migration
- Build ingestion script
Phase 2: Copy Boilerplate ✅¶
- Copy frontend from JuDDGES
- Copy backend from JuDDGES
- Update branding (JuDDGES → Juddges)
- Configure environment variables
- Test local development setup
Phase 3: Customize for Judgments ✅¶
- Create judgment search UI
- Implement semantic search endpoint
- Add jurisdiction filters
- Create judgment detail page
- Add keyword-based navigation
Phase 4: Data Ingestion ✅¶
- Run Supabase migrations
- Ingest Polish judgments
- Ingest UK judgments
- Verify data quality
- Test search functionality
Phase 5: Features (In Progress)¶
- Add RAG-based chat for legal Q&A
- Create analytics dashboard
- Implement judgment comparison
- Add citation tracking
- Build export functionality
Phase 6: Deployment ✅¶
- Set up production Supabase project
- Configure CI/CD pipeline
- Deploy to production (Docker Hub + deploy scripts)
- Set up monitoring (Langfuse)
- Create user documentation
Phase 7: Search Quality & Data Scale (In Progress)¶
- Curate 6K+ Polish judgment dataset with topic coverage (issue #12)
- Topic analysis of 6K UK judgments (issue #10)
- Cross-jurisdictional search query generation (issue #11)
- Search quality evaluation framework (issue #13)
- Query classification and alpha routing
- Cross-encoder reranking with Cohere API
- Polish text search with unaccent + per-document language detection
🛠️ Technology Stack¶
Frontend¶
- Framework: Next.js 15 (App Router)
- UI: React 19 + Radix UI
- Styling: Tailwind CSS 4
- State: Zustand + React Query
- Forms: React Hook Form + Zod
Backend¶
- Framework: FastAPI (Python 3.12+)
- Database: PostgreSQL (via Supabase)
- Vector DB: pgvector extension
- Auth: Supabase Auth
- AI: OpenAI API (embeddings + chat)
Infrastructure¶
- Hosting: Supabase (database + auth)
- Deployment: Docker + Docker Compose
- Monitoring: Langfuse (optional)
- CI/CD: GitHub Actions
💡 Key Design Decisions¶
1. Why Supabase?¶
- Managed PostgreSQL: No server management
- Built-in Auth: Reduces development time
- Vector Support: pgvector for semantic search
- Real-time: WebSocket support for live updates
- Free Tier: Generous limits for development
2. Why pgvector over Weaviate?¶
- Simplicity: Single database for all data
- Cost: No separate vector DB instance
- Performance: Good enough for <1M vectors
- Integration: Native PostgreSQL features
3. Why OpenAI embeddings?¶
- Quality: State-of-the-art semantic understanding
- Stability: Production-ready API
- Cost: Reasonable at ~$0.0001 per 1K tokens
- Compatibility: Standard 768-dim vectors
4. Why Target 6K+ Documents?¶
- Search quality: Sufficient corpus for meaningful semantic search evaluation
- Coverage: ~3K Polish + ~3K UK ensures balanced cross-jurisdictional coverage
- Topic diversity: Enables representative topic analysis across legal domains
- Cost: ~$24 for embeddings (manageable one-time cost)
- Scalability: Architecture supports 200K+ with HNSW indexes
📊 Expected Performance¶
Search Performance¶
| Search Type | Latency | Quality |
|---|---|---|
| Exact Match | <50ms | Exact |
| Full-Text | <100ms | Good |
| Semantic | <200ms | Excellent |
| Hybrid | <250ms | Best |
Scalability Targets¶
| Dataset Size | Search Time | Storage |
|---|---|---|
| 6,000 cases | <150ms | ~120 MB |
| 20,000 cases | <300ms | ~500 MB |
| 50,000 cases | <500ms | ~1.5 GB |
| 200,000 cases | <1s | ~5 GB |
🔐 Security Considerations¶
- API Keys: Never commit
.envfile - Service Role: Only use server-side
- Row Level Security: Enable for production
- Rate Limiting: Protect API endpoints
- Input Validation: Sanitize user queries
📞 Support & Resources¶
Documentation¶
- Project Docs: See README.md and guides
- Supabase Docs: https://supabase.com/docs
- Next.js Docs: https://nextjs.org/docs
- FastAPI Docs: https://fastapi.tiangolo.com
Data Sources¶
- JuDDGES/pl-appealcourt-criminal (Polish): https://huggingface.co/datasets/JuDDGES/pl-appealcourt-criminal — DOI: 10.57967/hf/8772
- JuDDGES/en-appealcourt (UK): https://huggingface.co/datasets/JuDDGES/en-appealcourt — DOI: 10.57967/hf/8773
Community¶
- Issues: GitHub Issues (when repo is created)
- Discussions: GitHub Discussions
- Email: [Your contact email]
📝 License¶
MIT License - See LICENSE file for details
🎉 Conclusion¶
The Juddges App foundation is now complete! You have:
✅ Database schema ready for Polish and UK judgments ✅ Ingestion pipeline to load data from HuggingFace ✅ Comprehensive documentation for setup and usage ✅ MCP tools guide for database management ✅ Scalable architecture ready for growth
Next action: Follow the SETUP_GUIDE.md to set up your Supabase project and ingest your first judgments!
Last updated: March 9, 2026 Version: 2.0.0