Skip to content

JuDDGES — Open-Science & FAIR4RS Report

Software-sustainability and reproducibility report for the JuDDGES codebase, structured as direct answers to standard open-science evaluation criteria (FAIR4RS / EOSC software-sustainability).

Repository: https://github.com/pwr-ai/JuDDGES Last verified against the repository: 2026-04-30.

Looking for a blank version of this form to use on a different project? See the reusable template.md in this directory.


1. Project context

Q: Briefly describe the context and purpose.

A: JuDDGES is an end-to-end research codebase for working with Polish (and to a lesser extent English) legal-judgment data. It covers six tightly integrated capabilities:

  1. Data acquisition from the Polish Common Courts API and the National Administrative Court (NSA), implemented under scripts/nsa/ and the data loaders in juddges/data/.
  2. Storage and semantic retrieval of judgments in a Weaviate vector database (legal_documents, document_chunks collections), with an ingestion pipeline (scripts/embed/ingest_to_weaviate.py) and a Docker-Compose deployment in weaviate/.
  3. Automated dataset analysis producing descriptive statistics and quality reports under scripts/analytics/ and notebooks in nbs/, so every published dataset ships with a transparent statistical profile.
  4. Schema-driven information extraction running LLM inference over the corpus against a user-defined Pydantic schema (juddges/models/, scripts/extraction/), supporting both local vLLM and OpenAI-compatible API back-ends.
  5. Fine-tuning and evaluation of Llama 3.1/3.2, Mistral, Bielik (Polish), Phi-4 via PEFT/LoRA on Unsloth (scripts/sft/), with both n-gram metrics and an LLM-as-judge protocol (juddges/evaluation/).
  6. A Human-in-the-Loop annotation toolkit on Label Studio in label_studio_toolkit/ — annotation tasks declared as Pydantic schemas + XML form templates; the toolkit automates LLM preannotation, task and prediction upload, human review and correction, and export of the corrected annotations as a structured dataset (dataset.json + schema.yaml). Three reference tasks ship in the repo: Swiss Franc loan cases (schemas/swiss_frank.py + form_templates/swiss_frank.xml), personal rights (schemas/personal_rights.py + form_templates/personal_rights.xml), and English appeal-court (schemas/en_appealcourt.py + form_templates/en_appealcourt.xml, driven by configs/annotate_data_en_appealcourt.yaml).

The codebase exists so the JuDDGES project can publish high-quality Polish legal datasets and reproducible model artefacts on the JuDDGES Hugging Face organisation. It can be reused by anyone reproducing the data-collection workflow, refreshing datasets with newly published judgments, extracting structured information under a custom schema, or fine-tuning an LLM for a domain-specific task.


2. Evaluation criteria

DOCUMENTATION — License & accessibility

Q: How can the repository be accessed by third parties?

A: Public, registration-free GitHub at https://github.com/pwr-ai/JuDDGES. Four-way licensing model:

  • Code under Apache 2.0 (also declared in pyproject.toml license field).
  • Documentation under CC BY 4.0.
  • Datasets on Hugging Face under CC BY 4.0 (declared in each dataset card's YAML metadata).
  • Fine-tuned models under OpenRAIL-M (declared in each model card's YAML metadata).

Q: What type of documentation is available, provided with the project and delivered under the same conditions?

A: Multi-layer documentation under the same open licenses:

Q: Does the documentation describe how to use/build/deploy/install the project?

A: No web application is shipped — JuDDGES is a research codebase composed of CLI scripts, DVC pipelines, and a Label Studio integration, not a hosted web service. The accompanying documentation therefore focuses on what is actually delivered:


TESTING — Sample data & parameters

Q: Are sample data and/or parameters that can be used to test the project available with the source code?

A: Yes, in four categories.


INTEROPERABILITY — Standard I/O formats

Q: Do you use existing and standard input/output formats?

A: Yes. Datasets stored as Parquet and distributed via the Hugging Face datasets library; CSV and JSON for tabular and metadata exports (samples under data/sample_data/). Configuration in YAML (Hydra-structured). Dependencies declared in requirements.txt, pyproject.toml, and a fully resolved uv.lock. Vector data persisted in Weaviate with predefined collections (legal_documents, document_chunks) and deterministic UUIDs for deduplication. Embeddings generated with the publicly available sdadas/mmlw-roberta-large model so they are regenerable. The annotation toolkit consumes Parquet/HF datasets, calls any OpenAI-compatible REST API for LLM preannotation (works with OpenAI, vLLM, Ollama, LiteLLM, or self-hosted endpoints — see label_studio_toolkit/api/client.py), uses Label Studio's standard JSON task format for upload/review (see scripts/label_studio/upload_with_preannotation.py), and exports human-corrected annotations as a portable pair: dataset.json + schema.yaml (scripts/label_studio/export_annotated_dataset.py).


VERSIONING — Source-code version control

Q: Do you use a version control system?

A: Yes. Git + GitHub at https://github.com/pwr-ai/JuDDGES. Standard branch + PR review workflow with pre-commit hooks (.pre-commit-config.yaml, invoked via make fix / make check) enforcing formatting, linting, type-checking, markdown lint, and a custom spell-check before merge. Continuous integration under .github/workflows/: a Python test/quality pipeline (python.yaml), a docs build-and-deploy pipeline (docs-build-deploy.yaml), a docs PR-preview pipeline (docs-pr-preview.yaml), and a docs quality-checks pipeline (docs-quality-checks.yaml).


REPRODUCIBILITY — Releases

Q: Do you provide releases of your software?

A: Yes — three layers.

  • (a) GitHub Releases backed by a Zenodo persistent DOI: archived on Zenodo with concept DOI 10.5281/zenodo.19911970 (always-latest) and v0.1.0 version DOI 10.5281/zenodo.19911971; plus a research-reproducibility Git tag neurips_v0.1 capturing the SFT experiments on pl-swiss-franc-loans.
  • (b) DVC pipeline tracking via dvc.yaml (preprocessing, embedding, instruct-dataset construction, SFT, raw and fine-tuned prediction, n-gram and LLM-as-judge evaluation), with matrix expansion over (model × dataset × seed). The lockfile dvc.lock records exact inputs/parameters/hashes/outputs so any reported artefact can be reproduced with dvc repro <stage>.
  • (c) Sample data version-tracked through DVC .dvc pointer files under data/sample_data/.

Q: How do you define language-specific dependencies of your project and their version?

A: Three layers, all committed to the repository:

  • A requirements.txt for runtime dependencies.
  • A pyproject.toml describing the package and its optional install groups.
  • A fully resolved uv.lock lockfile pinning every transitive dependency to an exact version for byte-identical environment reconstruction.

Recommended path uses uv (uv venv .venv && uv pip install -e .); make install is provided for pip-based workflows; make install_unsloth provisions a dedicated conda environment for fine-tuning. Required CUDA version (12.4 by default) is documented alongside install instructions. External services (Weaviate, Label Studio, optional Postgres/LLM stacks) are pinned via docker compose files: docker-compose.yml, docker-compose.extraction.yml, docker-compose-llm-postgres-optimized.yml.

Q: Do you state how to report bugs and/or usability problems by the software user(s)?

A: Yes. Users are directed to the GitHub Issues tracker at https://github.com/pwr-ai/JuDDGES/issues. Four templated issue forms ship under .github/ISSUE_TEMPLATE/: bug_report.yml, feature_request.yml, documentation.yml, plus a config.yml that disables blank issues and links to GitHub Discussions and Security Advisories. PRs are reviewed against .github/PULL_REQUEST_TEMPLATE.md. Contribution conventions live in CONTRIBUTING.md; a coordinated vulnerability-disclosure policy lives in SECURITY.md (private channel via GitHub Security Advisories + email backup).

Q: Do you state how to report bugs and/or usability problems by the web app user(s)?

A: Not applicable — JuDDGES does not ship a hosted web application; it is a research codebase that runs locally or on a user-controlled compute environment. The Label Studio UI used by the annotation toolkit is a third-party component whose own bug-reporting channels apply to UI defects; issues specific to the JuDDGES integration with Label Studio are reported on the same GitHub Issues tracker.


RECOGNITION — Citation information

Q: Do you include citation information (i.e. how to cite your software in the form of citation.cff, codemeta.json or bibtex)?

A: Yes — all four canonical formats, all carrying the same software-author roster and pointing at the same reference publication:

  1. CITATION.cff — Citation File Format v1.2.0 at the repository root. Consumed by GitHub's "Cite this repository" widget, Zenodo, Zotero, Mendeley, OpenAIRE.
  2. codemeta.json — CodeMeta v3.0 JSON-LD (@context: https://w3id.org/codemeta/3.0, @type: SoftwareSourceCode). Consumed by HAL, OpenAIRE, Software Heritage, re3data.
  3. Zenodo persistent DOI — concept DOI 10.5281/zenodo.19911970 (always-latest) and version DOI 10.5281/zenodo.19911971 (v0.1.0); badge rendered in docs/index.md.
  4. Copy-pasteable BibTeX (below).

Software-author roster (eleven authors across five institutions, reflecting the expanded research collaboration behind the codebase): Łukasz Augustyniak, Jakub Binkowski, Albert Sawczyn, Tomasz Kajdanowicz (Wrocław University of Science and Technology); Michał Bernaczyk (University of Wrocław); Krzysztof Kamiński (Court of Appeal, Wrocław); Santosh Tirunagari, David Windridge, Mandeep K. Dhami (Middlesex University); Chérifa Boukacem-Zeghmouri, Candice Fillaud (Université Claude Bernard Lyon 1).

Reference paper: "Bridging AI and Law: A Scalable Multi-Agent Platform for Quantitative Legal Analytics Across Millions of Documents" (Augustyniak et al., 2026), Bridge between AI and Law workshop, pp. 207–214. https://openreview.net/forum?id=hWjsyTSWrY. Note the BAIL paper's authorship (eleven WUST authors) is intentionally distinct from the broader software-collaboration roster above — software-author lists evolve after publication.

@inproceedings{augustyniak2026bridging,
  author    = {Lukasz Augustyniak and Jakub Binkowski and Albert Sawczyn and Kamil Tagowski and Denis Janiak and Mateusz Bystroński and Grzegorz Piotrowski and Michal Bernaczyk and Krzysztof Kamiński and Adrian Szymczak and Tomasz Jan Kajdanowicz},
  booktitle = {Bridge between Artificial Intelligence and Law},
  pages     = {207--214},
  title     = {Bridging {AI} and Law: A Scalable Multi-Agent Platform for Quantitative Legal Analytics Across Millions of Documents},
  url       = {https://openreview.net/forum?id=hWjsyTSWrY},
  year      = {2026}
}

3. Open-science checklist — current status

The verifications above against the repository as of 2026-04-30 identified twelve open-science items beyond the FAIR4RS / EOSC software-sustainability baseline. 10 of 12 are now in place (including a Zenodo-minted persistent DOI and the Contributor Covenant Code of Conduct); the remaining two require off-repo actions on accounts the project owner controls (Software Heritage, OpenSSF Best Practices).

# Item Principle Status Where it lives
1 CITATION.cff (CFF v1.2.0) FAIR4RS R1.2 — machine-readable citation metadata Done CITATION.cff
2 codemeta.json (CodeMeta v3.0 JSON-LD) FAIR4RS R1.2 — cross-platform research-software metadata Done codemeta.json
3 GitHub Release + Zenodo DOI FAIR4RS F1 — globally unique persistent identifier Done Concept DOI 10.5281/zenodo.19911970; version DOI 10.5281/zenodo.19911971; badge in docs/index.md
4 CONTRIBUTING.md EOSC sustainability — contributor onboarding Done CONTRIBUTING.md
5 CODE_OF_CONDUCT.md (Contributor Covenant v2.1) EOSC community-health Done CODE_OF_CONDUCT.md
6 SECURITY.md EOSC sustainability — coordinated vulnerability disclosure Done SECURITY.md
7 .github/ISSUE_TEMPLATE/ (4 files) EOSC sustainability — triage hygiene Done bug_report.yml, feature_request.yml, documentation.yml, config.yml
8 .github/PULL_REQUEST_TEMPLATE.md EOSC sustainability — PR review hygiene Done PULL_REQUEST_TEMPLATE.md
9 Software Heritage SWHID FAIR4RS F1 — archival permanence beyond GitHub Pending — external action — tracked in #65 n/a
10 en_appealcourt.xml Label Studio form Reproducibility — symmetry with the other reference annotation tasks Done label_studio_toolkit/form_templates/en_appealcourt.xml
11 docs/index.md landing page Documentation accessibility — clears mkdocs build --strict warning Done docs/index.md
12 OpenSSF Best Practices badge / Scorecard Third-party software-sustainability indicator Pending — external action — tracked in #66 n/a

Pending external actions

Two items can only be closed off-repo on accounts the project owner controls. Neither is a blocker; both materially improve archival permanence and third-party signalling. Each is tracked as a dedicated GitHub issue with full step-by-step instructions and acceptance criteria.

  1. Software Heritage SWHID (#9) — tracked in issue #65. Steps: (a) open https://archive.softwareheritage.org/save/, (b) submit https://github.com/pwr-ai/JuDDGES, (c) wait for the archival job to complete (typically minutes), (d) copy the resulting swh:1:dir:… SWHID and add it to CITATION.cff under the existing identifiers: block (with type: swh), and to codemeta.json as an additional identifier. Effort: ≈ 10 minutes.
  2. OpenSSF Best Practices badge (#12) — tracked in issue #66. Steps: (a) sign in at https://www.bestpractices.dev/ with GitHub OAuth, (b) register pwr-ai/JuDDGES, (c) work through the criteria checklist (most items already pass), (d) embed the resulting passing / silver / gold badge in docs/index.md. Optionally enable the OpenSSF Scorecard GitHub Action for an automated weekly score. Effort: ≈ 30–45 minutes.

Recently completed

Total remaining effort once external accounts are available: ≈ 45 minutes.