JuDDGES Documentation¶
End-to-end codebase for acquiring, embedding, fine-tuning, annotating, and evaluating Polish legal judgments with Large Language Models.
What JuDDGES does¶
JuDDGES is a research codebase that takes Polish legal judgments end-to-end: it acquires raw documents from the Polish Common Courts API and the National Administrative Court (NSA), generates multilingual legal embeddings with sdadas/mmlw-roberta-large for storage in Weaviate, and runs schema-driven information extraction with Pydantic schemas on top of fine-tuned Large Language Models (Llama 3.1/3.2, Mistral, Bielik for Polish, Phi-4) trained via PEFT/LoRA with Unsloth. A Human-in-the-Loop annotation toolkit on Label Studio supports iterative dataset curation, and the entire pipeline — preprocessing, embedding, supervised fine-tuning, prediction, and evaluation — is reproducibly tracked with DVC. All datasets, code, and trained models are openly published: see the GitHub repository and the Hugging Face organisation.
Quick start¶
Clone the repository and install the package into a fresh virtual environment.
git clone https://github.com/pwr-ai/JuDDGES.git
cd JuDDGES
uv venv .venv && source .venv/bin/activate
uv pip install -e .
Run the full quality + test sweep with make all.
Documentation map¶
| Section | Use this when |
|---|---|
| Tutorials | learning JuDDGES from scratch |
| How-to guides | accomplishing a specific task (ingesting a dataset, running fine-tuning, exporting annotations) |
| API reference | looking up a specific function, class, or configuration field |
| Explanation | understanding the architecture, data-flow, or research motivation |
| Open Science | citation, licensing, reproducibility, FAIR4RS compliance |
Project structure (one-glance)¶
juddges/— library codescripts/— CLI entry points (data ingestion, training, evaluation)configs/— Hydra configurations (datasets, models, pipelines)dvc.yaml— pipeline definitionlabel_studio_toolkit/— HITL annotation toolkit (Pydantic schemas + Label Studio integration)tests/— pytest suitedocs/— this documentation site
Citation¶
If you use JuDDGES in academic work, please cite the BAIL 2026 paper (Bridging AI and Law: A Scalable Multi-Agent Platform for Quantitative Legal Analytics Across Millions of Documents). The full BibTeX, CITATION.cff, and codemeta.json metadata are documented in Open Science → Recognition.
Contributing¶
Contributions are welcome — see CONTRIBUTING.md for the development workflow. The project follows the Contributor Covenant v2.1. For vulnerability disclosure, see SECURITY.md.
License¶
Code: Apache 2.0 — Documentation: CC BY 4.0.