Skip to content

JuDDGES Documentation

End-to-end codebase for acquiring, embedding, fine-tuning, annotating, and evaluating Polish legal judgments with Large Language Models.

Zenodo DOI Code license: Apache 2.0 Docs license: CC BY 4.0 HuggingFace organisation GitHub repository

What JuDDGES does

JuDDGES is a research codebase that takes Polish legal judgments end-to-end: it acquires raw documents from the Polish Common Courts API and the National Administrative Court (NSA), generates multilingual legal embeddings with sdadas/mmlw-roberta-large for storage in Weaviate, and runs schema-driven information extraction with Pydantic schemas on top of fine-tuned Large Language Models (Llama 3.1/3.2, Mistral, Bielik for Polish, Phi-4) trained via PEFT/LoRA with Unsloth. A Human-in-the-Loop annotation toolkit on Label Studio supports iterative dataset curation, and the entire pipeline — preprocessing, embedding, supervised fine-tuning, prediction, and evaluation — is reproducibly tracked with DVC. All datasets, code, and trained models are openly published: see the GitHub repository and the Hugging Face organisation.

Quick start

Clone the repository and install the package into a fresh virtual environment.

git clone https://github.com/pwr-ai/JuDDGES.git
cd JuDDGES
uv venv .venv && source .venv/bin/activate
uv pip install -e .

Run the full quality + test sweep with make all.

Documentation map

Section Use this when
Tutorials learning JuDDGES from scratch
How-to guides accomplishing a specific task (ingesting a dataset, running fine-tuning, exporting annotations)
API reference looking up a specific function, class, or configuration field
Explanation understanding the architecture, data-flow, or research motivation
Open Science citation, licensing, reproducibility, FAIR4RS compliance

Project structure (one-glance)

  • juddges/ — library code
  • scripts/ — CLI entry points (data ingestion, training, evaluation)
  • configs/ — Hydra configurations (datasets, models, pipelines)
  • dvc.yaml — pipeline definition
  • label_studio_toolkit/ — HITL annotation toolkit (Pydantic schemas + Label Studio integration)
  • tests/ — pytest suite
  • docs/ — this documentation site

Citation

If you use JuDDGES in academic work, please cite the BAIL 2026 paper (Bridging AI and Law: A Scalable Multi-Agent Platform for Quantitative Legal Analytics Across Millions of Documents). The full BibTeX, CITATION.cff, and codemeta.json metadata are documented in Open Science → Recognition.

Contributing

Contributions are welcome — see CONTRIBUTING.md for the development workflow. The project follows the Contributor Covenant v2.1. For vulnerability disclosure, see SECURITY.md.

License

Code: Apache 2.0 — Documentation: CC BY 4.0.