JuDDGES Documentation¶

End-to-end codebase for acquiring, embedding, fine-tuning, annotating, and evaluating Polish legal judgments with Large Language Models.

What JuDDGES does¶

JuDDGES is a research codebase that takes Polish legal judgments end-to-end: it acquires raw documents from the Polish Common Courts API and the National Administrative Court (NSA), generates multilingual legal embeddings with sdadas/mmlw-roberta-large for storage in Weaviate, and runs schema-driven information extraction with Pydantic schemas on top of fine-tuned Large Language Models (Llama 3.1/3.2, Mistral, Bielik for Polish, Phi-4) trained via PEFT/LoRA with Unsloth. A Human-in-the-Loop annotation toolkit on Label Studio supports iterative dataset curation, and the entire pipeline — preprocessing, embedding, supervised fine-tuning, prediction, and evaluation — is reproducibly tracked with DVC. All datasets, code, and trained models are openly published: see the GitHub repository and the Hugging Face organisation.

Quick start¶

Clone the repository and install the package into a fresh virtual environment.

git clone https://github.com/pwr-ai/JuDDGES.git
cd JuDDGES
uv venv .venv && source .venv/bin/activate
uv pip install -e .

Run the full quality + test sweep with make all.

Documentation map¶

Section	Use this when
Tutorials	learning JuDDGES from scratch
How-to guides	accomplishing a specific task (ingesting a dataset, running fine-tuning, exporting annotations)
API reference	looking up a specific function, class, or configuration field
Explanation	understanding the architecture, data-flow, or research motivation
Open Science	citation, licensing, reproducibility, FAIR4RS compliance

Project structure (one-glance)¶

juddges/ — library code
scripts/ — CLI entry points (data ingestion, training, evaluation)
configs/ — Hydra configurations (datasets, models, pipelines)
dvc.yaml — pipeline definition
label_studio_toolkit/ — HITL annotation toolkit (Pydantic schemas + Label Studio integration)
tests/ — pytest suite
docs/ — this documentation site

Citation¶

If you use JuDDGES in academic work, please cite the BAIL 2026 paper (Bridging AI and Law: A Scalable Multi-Agent Platform for Quantitative Legal Analytics Across Millions of Documents). The full BibTeX, CITATION.cff, and codemeta.json metadata are documented in Open Science → Recognition.

Contributing¶

Contributions are welcome — see CONTRIBUTING.md for the development workflow. The project follows the Contributor Covenant v2.1. For vulnerability disclosure, see SECURITY.md.

License¶

Code: Apache 2.0 — Documentation: CC BY 4.0.