JuDDGES — Open-Science & FAIR4RS Report¶
Software-sustainability and reproducibility report for the JuDDGES codebase, structured as direct answers to standard open-science evaluation criteria (FAIR4RS / EOSC software-sustainability).
Repository: https://github.com/pwr-ai/JuDDGES Last verified against the repository: 2026-04-30.
Looking for a blank version of this form to use on a different project? See the reusable
template.mdin this directory.
1. Project context¶
Q: Briefly describe the context and purpose.
A: JuDDGES is an end-to-end research codebase for working with Polish (and to a lesser extent English) legal-judgment data. It covers six tightly integrated capabilities:
- Data acquisition from the Polish Common Courts API and the National Administrative Court (NSA), implemented under
scripts/nsa/and the data loaders injuddges/data/. - Storage and semantic retrieval of judgments in a Weaviate vector database (
legal_documents,document_chunkscollections), with an ingestion pipeline (scripts/embed/ingest_to_weaviate.py) and a Docker-Compose deployment inweaviate/. - Automated dataset analysis producing descriptive statistics and quality reports under
scripts/analytics/and notebooks innbs/, so every published dataset ships with a transparent statistical profile. - Schema-driven information extraction running LLM inference over the corpus against a user-defined Pydantic schema (
juddges/models/,scripts/extraction/), supporting both local vLLM and OpenAI-compatible API back-ends. - Fine-tuning and evaluation of Llama 3.1/3.2, Mistral, Bielik (Polish), Phi-4 via PEFT/LoRA on Unsloth (
scripts/sft/), with both n-gram metrics and an LLM-as-judge protocol (juddges/evaluation/). - A Human-in-the-Loop annotation toolkit on Label Studio in
label_studio_toolkit/— annotation tasks declared as Pydantic schemas + XML form templates; the toolkit automates LLM preannotation, task and prediction upload, human review and correction, and export of the corrected annotations as a structured dataset (dataset.json+schema.yaml). Three reference tasks ship in the repo: Swiss Franc loan cases (schemas/swiss_frank.py+form_templates/swiss_frank.xml), personal rights (schemas/personal_rights.py+form_templates/personal_rights.xml), and English appeal-court (schemas/en_appealcourt.py+form_templates/en_appealcourt.xml, driven byconfigs/annotate_data_en_appealcourt.yaml).
The codebase exists so the JuDDGES project can publish high-quality Polish legal datasets and reproducible model artefacts on the JuDDGES Hugging Face organisation. It can be reused by anyone reproducing the data-collection workflow, refreshing datasets with newly published judgments, extracting structured information under a custom schema, or fine-tuning an LLM for a domain-specific task.
2. Evaluation criteria¶
DOCUMENTATION — License & accessibility¶
Q: How can the repository be accessed by third parties?
A: Public, registration-free GitHub at https://github.com/pwr-ai/JuDDGES. Four-way licensing model:
- Code under Apache 2.0 (also declared in
pyproject.tomllicensefield). - Documentation under CC BY 4.0.
- Datasets on Hugging Face under CC BY 4.0 (declared in each dataset card's YAML metadata).
- Fine-tuned models under OpenRAIL-M (declared in each model card's YAML metadata).
Q: What type of documentation is available, provided with the project and delivered under the same conditions?
A: Multi-layer documentation under the same open licenses:
- (a) Top-level project overview, install steps (automated via
setup.sh/setup.batand manual viauv+pyproject.toml), plus quick-start commands. - (b)
docs/directory organised under the Diátaxis framework —tutorials/,how-to/,reference/,explanation/— built and deployed as a Material-for-MkDocs site by.github/workflows/docs-build-deploy.yaml(with separate PR-preview and quality-checks workflows). - (c) This open-science / FAIR4RS report at
docs/open-science/index.md, rendered at the clean URL/open-science/. - (d) Executable Jupyter notebooks in
nbs/anddev_notebooks/. - (e) Sub-toolkit docs in
label_studio_toolkit/docs/:setup,workflows,preannotation,upload-and-annotate,export,add-new-task. - (f) Hydra YAML configurations in
configs/act as living reference for every published experiment.
Q: Does the documentation describe how to use/build/deploy/install the project?
A: No web application is shipped — JuDDGES is a research codebase composed of CLI scripts, DVC pipelines, and a Label Studio integration, not a hosted web service. The accompanying documentation therefore focuses on what is actually delivered:
- Installation is documented two ways: automated via
setup.sh/setup.bat, and manual viauv venv .venv && uv pip install -e .againstpyproject.toml+requirements.txt. - Build / dev workflows through the project
Makefile:make install,make install_unsloth,make fix,make check,make check-types,make test,make all. - Pipeline execution as DVC stages in
dvc.yaml(e.g.dvc repro predict_raw_vllm,dvc repro predict_swiss_franc_loans_on_fine_tuned_vllm,dvc repro evaluate,dvc repro evaluate_llm_as_judge), with matrix expansion over(model × dataset × seed). - External services via Docker Compose: Weaviate (
weaviate/), an extraction stack (docker-compose.extraction.yml), and an LLM+Postgres optimised stack (docker-compose-llm-postgres-optimized.yml). - Label Studio annotation UI deployment is documented in
label_studio_toolkit/docs/setup.md.
TESTING — Sample data & parameters¶
Q: Are sample data and/or parameters that can be used to test the project available with the source code?
A: Yes, in four categories.
- (a) Automated tests under
tests/(pytest + coverage viamake test, full sweep viamake all): preprocessing tests (tests/preprocessing/), extraction tests (tests/extraction/), evaluation tests (tests/evals/), an LLM-as-judge suite (tests/llm_as_judge/), and a Weaviate embedding/ingestion integration suite (tests/embeddings/withREADME.md,conftest.py,run_tests.py). - (b) Sample data committed under
data/sample_data/— 100-row and 10-row CSV samples (judgements-100-sample.csv,judgements-100-sample-with-retrieved-informations.csv,judgements-konfiskata-100-sample.csv,judgements-10-konfiskata-sample-with-retrieved-informations.csv) — sufficient to exercise embedding, extraction, and evaluation end-to-end. Each sample is also DVC-tracked via.dvcpointer files. - (c) Use-case example scripts under
scripts/andexamples/, notably the instruct-dataset builders inscripts/dataset/and the Weaviate ingestion scriptscripts/embed/ingest_to_weaviate.py. - (d) Reference annotation tasks in
label_studio_toolkit/— all three tasks ship with both a Pydantic schema and a Label Studio XML form, wired throughconfigs/preannotate_label_studio.yaml,configs/upload_with_preannotation.yaml, andconfigs/annotate_data_en_appealcourt.yaml.
INTEROPERABILITY — Standard I/O formats¶
Q: Do you use existing and standard input/output formats?
A: Yes. Datasets stored as Parquet and distributed via the Hugging Face datasets library; CSV and JSON for tabular and metadata exports (samples under data/sample_data/). Configuration in YAML (Hydra-structured). Dependencies declared in requirements.txt, pyproject.toml, and a fully resolved uv.lock. Vector data persisted in Weaviate with predefined collections (legal_documents, document_chunks) and deterministic UUIDs for deduplication. Embeddings generated with the publicly available sdadas/mmlw-roberta-large model so they are regenerable. The annotation toolkit consumes Parquet/HF datasets, calls any OpenAI-compatible REST API for LLM preannotation (works with OpenAI, vLLM, Ollama, LiteLLM, or self-hosted endpoints — see label_studio_toolkit/api/client.py), uses Label Studio's standard JSON task format for upload/review (see scripts/label_studio/upload_with_preannotation.py), and exports human-corrected annotations as a portable pair: dataset.json + schema.yaml (scripts/label_studio/export_annotated_dataset.py).
VERSIONING — Source-code version control¶
Q: Do you use a version control system?
A: Yes. Git + GitHub at https://github.com/pwr-ai/JuDDGES. Standard branch + PR review workflow with pre-commit hooks (.pre-commit-config.yaml, invoked via make fix / make check) enforcing formatting, linting, type-checking, markdown lint, and a custom spell-check before merge. Continuous integration under .github/workflows/: a Python test/quality pipeline (python.yaml), a docs build-and-deploy pipeline (docs-build-deploy.yaml), a docs PR-preview pipeline (docs-pr-preview.yaml), and a docs quality-checks pipeline (docs-quality-checks.yaml).
REPRODUCIBILITY — Releases¶
Q: Do you provide releases of your software?
A: Yes — three layers.
- (a) GitHub Releases backed by a Zenodo persistent DOI: archived on Zenodo with concept DOI
10.5281/zenodo.19911970(always-latest) and v0.1.0 version DOI10.5281/zenodo.19911971; plus a research-reproducibility Git tagneurips_v0.1capturing the SFT experiments onpl-swiss-franc-loans. - (b) DVC pipeline tracking via
dvc.yaml(preprocessing, embedding, instruct-dataset construction, SFT, raw and fine-tuned prediction, n-gram and LLM-as-judge evaluation), with matrix expansion over(model × dataset × seed). The lockfiledvc.lockrecords exact inputs/parameters/hashes/outputs so any reported artefact can be reproduced withdvc repro <stage>. - (c) Sample data version-tracked through DVC
.dvcpointer files underdata/sample_data/.
Q: How do you define language-specific dependencies of your project and their version?
A: Three layers, all committed to the repository:
- A
requirements.txtfor runtime dependencies. - A
pyproject.tomldescribing the package and its optional install groups. - A fully resolved
uv.locklockfile pinning every transitive dependency to an exact version for byte-identical environment reconstruction.
Recommended path uses uv (uv venv .venv && uv pip install -e .); make install is provided for pip-based workflows; make install_unsloth provisions a dedicated conda environment for fine-tuning. Required CUDA version (12.4 by default) is documented alongside install instructions. External services (Weaviate, Label Studio, optional Postgres/LLM stacks) are pinned via docker compose files: docker-compose.yml, docker-compose.extraction.yml, docker-compose-llm-postgres-optimized.yml.
Q: Do you state how to report bugs and/or usability problems by the software user(s)?
A: Yes. Users are directed to the GitHub Issues tracker at https://github.com/pwr-ai/JuDDGES/issues. Four templated issue forms ship under .github/ISSUE_TEMPLATE/: bug_report.yml, feature_request.yml, documentation.yml, plus a config.yml that disables blank issues and links to GitHub Discussions and Security Advisories. PRs are reviewed against .github/PULL_REQUEST_TEMPLATE.md. Contribution conventions live in CONTRIBUTING.md; a coordinated vulnerability-disclosure policy lives in SECURITY.md (private channel via GitHub Security Advisories + email backup).
Q: Do you state how to report bugs and/or usability problems by the web app user(s)?
A: Not applicable — JuDDGES does not ship a hosted web application; it is a research codebase that runs locally or on a user-controlled compute environment. The Label Studio UI used by the annotation toolkit is a third-party component whose own bug-reporting channels apply to UI defects; issues specific to the JuDDGES integration with Label Studio are reported on the same GitHub Issues tracker.
RECOGNITION — Citation information¶
Q: Do you include citation information (i.e. how to cite your software in the form of citation.cff, codemeta.json or bibtex)?
A: Yes — all four canonical formats, all carrying the same software-author roster and pointing at the same reference publication:
CITATION.cff— Citation File Format v1.2.0 at the repository root. Consumed by GitHub's "Cite this repository" widget, Zenodo, Zotero, Mendeley, OpenAIRE.codemeta.json— CodeMeta v3.0 JSON-LD (@context: https://w3id.org/codemeta/3.0,@type: SoftwareSourceCode). Consumed by HAL, OpenAIRE, Software Heritage, re3data.- Zenodo persistent DOI — concept DOI
10.5281/zenodo.19911970(always-latest) and version DOI10.5281/zenodo.19911971(v0.1.0); badge rendered indocs/index.md. - Copy-pasteable BibTeX (below).
Software-author roster (eleven authors across five institutions, reflecting the expanded research collaboration behind the codebase): Łukasz Augustyniak, Jakub Binkowski, Albert Sawczyn, Tomasz Kajdanowicz (Wrocław University of Science and Technology); Michał Bernaczyk (University of Wrocław); Krzysztof Kamiński (Court of Appeal, Wrocław); Santosh Tirunagari, David Windridge, Mandeep K. Dhami (Middlesex University); Chérifa Boukacem-Zeghmouri, Candice Fillaud (Université Claude Bernard Lyon 1).
Reference paper: "Bridging AI and Law: A Scalable Multi-Agent Platform for Quantitative Legal Analytics Across Millions of Documents" (Augustyniak et al., 2026), Bridge between AI and Law workshop, pp. 207–214. https://openreview.net/forum?id=hWjsyTSWrY. Note the BAIL paper's authorship (eleven WUST authors) is intentionally distinct from the broader software-collaboration roster above — software-author lists evolve after publication.
@inproceedings{augustyniak2026bridging,
author = {Lukasz Augustyniak and Jakub Binkowski and Albert Sawczyn and Kamil Tagowski and Denis Janiak and Mateusz Bystroński and Grzegorz Piotrowski and Michal Bernaczyk and Krzysztof Kamiński and Adrian Szymczak and Tomasz Jan Kajdanowicz},
booktitle = {Bridge between Artificial Intelligence and Law},
pages = {207--214},
title = {Bridging {AI} and Law: A Scalable Multi-Agent Platform for Quantitative Legal Analytics Across Millions of Documents},
url = {https://openreview.net/forum?id=hWjsyTSWrY},
year = {2026}
}
3. Open-science checklist — current status¶
The verifications above against the repository as of 2026-04-30 identified twelve open-science items beyond the FAIR4RS / EOSC software-sustainability baseline. 10 of 12 are now in place (including a Zenodo-minted persistent DOI and the Contributor Covenant Code of Conduct); the remaining two require off-repo actions on accounts the project owner controls (Software Heritage, OpenSSF Best Practices).
| # | Item | Principle | Status | Where it lives |
|---|---|---|---|---|
| 1 | CITATION.cff (CFF v1.2.0) |
FAIR4RS R1.2 — machine-readable citation metadata | ✅ Done | CITATION.cff |
| 2 | codemeta.json (CodeMeta v3.0 JSON-LD) |
FAIR4RS R1.2 — cross-platform research-software metadata | ✅ Done | codemeta.json |
| 3 | GitHub Release + Zenodo DOI | FAIR4RS F1 — globally unique persistent identifier | ✅ Done | Concept DOI 10.5281/zenodo.19911970; version DOI 10.5281/zenodo.19911971; badge in docs/index.md |
| 4 | CONTRIBUTING.md |
EOSC sustainability — contributor onboarding | ✅ Done | CONTRIBUTING.md |
| 5 | CODE_OF_CONDUCT.md (Contributor Covenant v2.1) |
EOSC community-health | ✅ Done | CODE_OF_CONDUCT.md |
| 6 | SECURITY.md |
EOSC sustainability — coordinated vulnerability disclosure | ✅ Done | SECURITY.md |
| 7 | .github/ISSUE_TEMPLATE/ (4 files) |
EOSC sustainability — triage hygiene | ✅ Done | bug_report.yml, feature_request.yml, documentation.yml, config.yml |
| 8 | .github/PULL_REQUEST_TEMPLATE.md |
EOSC sustainability — PR review hygiene | ✅ Done | PULL_REQUEST_TEMPLATE.md |
| 9 | Software Heritage SWHID | FAIR4RS F1 — archival permanence beyond GitHub | ⚠ Pending — external action — tracked in #65 | n/a |
| 10 | en_appealcourt.xml Label Studio form |
Reproducibility — symmetry with the other reference annotation tasks | ✅ Done | label_studio_toolkit/form_templates/en_appealcourt.xml |
| 11 | docs/index.md landing page |
Documentation accessibility — clears mkdocs build --strict warning |
✅ Done | docs/index.md |
| 12 | OpenSSF Best Practices badge / Scorecard | Third-party software-sustainability indicator | ⚠ Pending — external action — tracked in #66 | n/a |
Pending external actions¶
Two items can only be closed off-repo on accounts the project owner controls. Neither is a blocker; both materially improve archival permanence and third-party signalling. Each is tracked as a dedicated GitHub issue with full step-by-step instructions and acceptance criteria.
- Software Heritage SWHID (#9) — tracked in issue #65. Steps: (a) open https://archive.softwareheritage.org/save/, (b) submit
https://github.com/pwr-ai/JuDDGES, (c) wait for the archival job to complete (typically minutes), (d) copy the resultingswh:1:dir:…SWHID and add it toCITATION.cffunder the existingidentifiers:block (withtype: swh), and tocodemeta.jsonas an additionalidentifier. Effort: ≈ 10 minutes. - OpenSSF Best Practices badge (#12) — tracked in issue #66. Steps: (a) sign in at https://www.bestpractices.dev/ with GitHub OAuth, (b) register
pwr-ai/JuDDGES, (c) work through the criteria checklist (most items already pass), (d) embed the resulting passing / silver / gold badge indocs/index.md. Optionally enable the OpenSSF Scorecard GitHub Action for an automated weekly score. Effort: ≈ 30–45 minutes.
Recently completed¶
- Zenodo DOI (#3) — completed 2026-04-30. GitHub Release + Zenodo integration was wired up; the deposition exposes a concept DOI
10.5281/zenodo.19911970(always-latest) and a version DOI10.5281/zenodo.19911971for the v0.1.0 snapshot. Both are recorded inCITATION.cff(top-leveldoi:field +identifiers:block) and incodemeta.json(identifierarray); the Zenodo DOI badge is rendered indocs/index.md.
Total remaining effort once external accounts are available: ≈ 45 minutes.