API Reference¶
Welcome to the JuDDGES API reference documentation. This section provides comprehensive documentation for all public APIs, classes, functions, and modules in the JuDDGES project.
Documentation Structure¶
The API documentation is organized by module functionality following the Diátaxis framework's Reference category - providing technical specifications and detailed information about the codebase.
Core Modules¶
Core Configuration¶
Configuration management using Hydra and OmegaConf for flexible, hierarchical configuration.
Key Components:
LLMConfig- Large language model configurationEmbeddingConfig- Embedding model and dataset configuration- Configuration loaders and validators
Settings¶
Application-wide settings using Pydantic for validation.
Key Components:
- Environment variable management
- API keys and credentials
- Database connection settings
Data Models¶
Core data structures used throughout the application.
Key Components:
- Document models
- Judgment models
- Extraction schemas
Schema¶
Weaviate schema definitions and validation.
Data Management¶
Data Loaders¶
Dataset loading utilities for Weaviate ingestion.
Key Functions:
DatasetLoader.load_chunk_dataset()- Load chunk embeddingsDatasetLoader.load_document_dataset()- Load document embeddings with column remapping- Dataset column mapping configurations
Weaviate Base Database¶
Base class for Weaviate database operations.
Key Features:
- Connection management
- Collection creation and management
- Batch operations
- Error handling
Judgments Database¶
Weaviate database operations for court judgments.
Key Components:
WeaviateJudgmentsDatabase- Main database class- Judgment and chunk collection management
- Schema definitions with 50+ fields
- UMAP coordinate support
Documents Database¶
Weaviate database operations for generic documents.
Dataset Factory¶
Factory for creating and managing datasets.
Dataset Mapper¶
Utilities for mapping between different dataset schemas.
Stream Ingester¶
Production-grade streaming ingestion pipeline.
Key Features:
- Batch processing
- Error handling and retry logic
- Progress tracking
- Memory-efficient streaming
LLM Operations¶
LLM Factory¶
Factory for creating and configuring language models.
Supported Models:
- Llama 3.1/3.2
- Mistral/Nemo
- Phi-4
- Bielik (Polish)
Key Functions:
get_llm()- Create model from configurationget_llama_3()- Llama-specific setupget_mistral()- Mistral-specific setup- Model quantization (4-bit, 8-bit)
- PEFT/LoRA adapter loading
Prediction¶
LLM prediction utilities.
Key Functions:
predict_with_llm()- Batch prediction with progress tracking- DataLoader integration
- Performance metrics
Information Extraction¶
Gemini Chain¶
LangChain extraction chain using Gemini 2.5 Pro/Flash.
Key Components:
GeminiExtractionChain- Main extraction classExtractionSchema- Schema definitionDocumentType- Document type enum
Key Features:
- Structured output parsing
- SQLite caching
- Langfuse observability integration
- Batch extraction support
- Automatic text truncation
Preprocessing¶
Text Chunker¶
Text chunking utilities for document segmentation.
Key Components:
TextChunker- Main chunking class- Recursive character splitting
- Token-based chunking
- Configurable overlap
Text Encoder¶
Text encoding and tokenization utilities.
Context Truncator¶
Context window management for LLMs.
Formatters¶
Text formatting utilities for legal documents.
Parser Base¶
Base class for document parsers.
PL Court Parser¶
Parser for Polish court documents.
Evaluation¶
Metrics¶
Evaluation metrics for information extraction.
Key Functions:
evaluate_date()- Date field evaluation with parsingevaluate_number()- Numeric field evaluation with toleranceevaluate_string_rouge()- ROUGE scores for text fieldsevaluate_enum()- Enum classification with hallucination detectionevaluate_list_greedy()- List matching with precision/recall/F1
Extraction Evaluation¶
End-to-end extraction evaluation pipeline.
LLM as Judge¶
Base¶
Base classes for LLM-as-judge evaluation.
Judge¶
Single-document LLM judge implementation.
Batched Judge¶
Batch processing LLM judge.
Data Model¶
Data models for LLM judge evaluation.
Result Loading¶
Utilities for loading and processing judge results.
Retrieval¶
Mongo Hybrid Search¶
Hybrid search combining semantic and keyword search.
Mongo Term-Based Search¶
Traditional keyword-based search.
Utilities¶
Config Utils¶
Configuration utilities and helpers.
Logging¶
Logging configuration using loguru.
Pipeline¶
Pipeline utilities for DVC workflows.
HuggingFace Utils¶
HuggingFace Hub utilities for dataset and model management.
Date Utils¶
Date parsing and formatting utilities.
Misc¶
Miscellaneous utilities.
Quick Navigation¶
By Use Case¶
Data Ingestion:
- Data Loaders - Load datasets
- Stream Ingester - Ingest to Weaviate
- Judgments Database - Database operations
Information Extraction:
- Gemini Chain - Extract with Gemini
- Metrics - Evaluate extractions
- LLM as Judge - LLM-based evaluation
Model Training & Inference:
- LLM Factory - Create models
- Prediction - Generate predictions
- Preprocessing - Prepare data
By Module Type¶
Configuration:
Data Access:
Models:
Evaluation:
Documentation Conventions¶
Docstring Style¶
All modules use Google-style docstrings:
def function_name(arg1: str, arg2: int) -> bool:
"""Brief description of function.
Longer description with more details about what the function does,
its purpose, and how it should be used.
Args:
arg1: Description of first argument
arg2: Description of second argument
Returns:
Description of return value
Raises:
ValueError: When input is invalid
Example:
>>> result = function_name("test", 42)
>>> print(result)
True
"""
Type Annotations¶
All public APIs include comprehensive type annotations following PEP 484.
Code Examples¶
Most functions include usage examples in docstrings.
Contributing to API Documentation¶
Adding Documentation¶
- Update Docstrings: Add or improve docstrings in source code
- Regenerate Docs: Run
./scripts/docs/generate_api_docs.sh - Review: Check generated documentation in
docs/reference/api/ - Commit: Include both source and generated docs in commit
Documentation Standards¶
Follow the Style Guide for:
- Docstring formatting
- Type annotation conventions
- Example code standards
- Cross-referencing guidelines
Automation¶
API documentation is automatically generated from source code docstrings using:
- MkDocs: Static site generator
- mkdocstrings: Python documentation plugin
- Material for MkDocs: Modern theme
Related Documentation¶
- Tutorials - Learn by doing
- How-To Guides - Solve specific problems
- Explanation - Understand concepts
- Style Guide - Documentation standards
Need Help?¶
- Missing Documentation: Report an issue
- Unclear API: Request clarification in GitHub Discussions
- Contributing: See Contributing Guide
Last Updated: 2025-10-11 Coverage: ~60% of public APIs documented Target: 100% coverage by 2025-11-01