Contributing to Documentation¶
This guide explains how to contribute to the JuDDGES documentation, run quality checks locally, and understand the CI/CD pipeline.
Table of Contents¶
- Overview
- Documentation Structure
- Getting Started
- Writing Documentation
- Local Development
- Quality Checks
- CI/CD Pipeline
- Best Practices
- Troubleshooting
Overview¶
The JuDDGES documentation follows the Diátaxis framework, organizing content into four distinct types:
- Tutorials - Learning-oriented guides that teach through practical examples
- How-To Guides - Task-oriented instructions for solving specific problems
- Reference - Information-oriented API documentation and technical specifications
- Explanation - Understanding-oriented discussions of concepts and architecture
Documentation Structure¶
docs/
├── index.md # Homepage
├── getting-started/ # Installation and quickstart
├── tutorials/ # Step-by-step learning guides
├── how-to/ # Problem-solving guides
│ ├── documentation/ # Documentation contribution guides
│ ├── data_acquisition.md
│ ├── embeddings.md
│ └── extraction.md
├── reference/ # API reference and technical specs
│ └── api/ # Auto-generated API docs
└── explanation/ # Conceptual explanations
├── architecture.md
└── research.md
Getting Started¶
Prerequisites¶
- Python 3.11+
- Git
- Node.js 20+ (for spell checking)
- Text editor with Markdown support
Initial Setup¶
- Clone the repository and set up your environment:
- Install Python dependencies:
# Using UV (recommended)
uv venv .venv
source .venv/bin/activate # On Linux/macOS
# .venv\Scripts\activate # On Windows
uv pip install -e .
# Using Make (legacy)
make install_cpu
- Install MkDocs and documentation tools:
- Install pre-commit hooks:
- Install Node.js tools for spell checking:
Writing Documentation¶
Markdown Guidelines¶
- Use ATX-style headings (
#,##,###) - Keep lines under 120 characters where possible
- Use fenced code blocks with language specifiers
- Add blank lines before and after headings, lists, and code blocks
Code Examples¶
Always include language identifiers for syntax highlighting:
from juddges.data import BaseWeaviateDatabase
# Initialize database connection
db = BaseWeaviateDatabase(
url="http://localhost:8080",
collection_name="legal_documents"
)
Admonitions¶
Use admonitions for notes, warnings, and tips:
!!! note "Important Information"
This is a note with a custom title.
!!! warning
This is a warning without a custom title.
!!! tip
This is a helpful tip.
Mermaid Diagrams¶
Include architecture and flow diagrams using Mermaid:
```mermaid
graph TD
A[Raw Documents] --> B[Preprocessing]
B --> C[Embeddings]
C --> D[Weaviate Storage]
```
Cross-References¶
Link to other documentation pages:
For more information, see [Data Acquisition](../data_acquisition.md).
See the [API Reference](/reference/api/data/loaders/) for detailed documentation.
API Documentation¶
API reference documentation is auto-generated from docstrings. Ensure your code has comprehensive Google-style docstrings:
def load_documents(path: str, limit: int = None) -> List[Document]:
"""Load documents from a parquet file.
Args:
path: Path to the parquet file.
limit: Maximum number of documents to load. If None, load all.
Returns:
List of Document objects.
Raises:
FileNotFoundError: If the specified path does not exist.
ValueError: If the file format is invalid.
Example:
```python
docs = load_documents("data/documents.parquet", limit=100)
```
"""
pass
Local Development¶
Preview Documentation Locally¶
Start a local development server with auto-reload:
Then open http://localhost:8000 in your browser. The documentation will automatically reload when you save changes.
Build Documentation¶
Build the static site without serving:
The built site will be in the site/ directory.
Strict Mode¶
Build in strict mode to catch warnings as errors:
This is the same mode used in CI/CD.
Quality Checks¶
Run All Checks Locally¶
Before submitting a PR, run all quality checks:
# Format and lint code (includes pre-commit hooks)
make fix
# Run markdown linting
markdownlint-cli2 "docs/**/*.md" --config .markdownlint.json
# Run spell checker
cspell --config cspell.json "docs/**/*.md" "README.md"
# Build documentation in strict mode
mkdocs build --strict
# Run tests
make test
Individual Checks¶
Markdown Linting¶
# Check specific files
markdownlint-cli2 "docs/how-to/data_acquisition.md"
# Check all documentation
markdownlint-cli2 "docs/**/*.md"
# Auto-fix issues where possible
markdownlint-cli2 "docs/**/*.md" --fix
Spell Checking¶
# Check specific files
cspell docs/how-to/data_acquisition.md
# Check all documentation
cspell "docs/**/*.md"
# Add words to custom dictionary
# Edit cspell.json and add to the "words" array
Link Validation¶
# Build documentation first
mkdocs build
# Check links using lychee
lychee "./site/**/*.html" --exclude 'linkedin.com' --exclude 'twitter.com'
Python Code Examples¶
Python code blocks in documentation are automatically validated for syntax errors in CI. Test them locally:
# Run comprehensive code example tests
pytest tests/docs/test_documentation_examples.py -v
# Run standalone validator
python scripts/docs/test_code_examples.py --verbose
# Test specific file
python scripts/docs/test_code_examples.py docs/reference/api/README.md
Writing Testable Examples:
Follow these guidelines to ensure your code examples pass tests:
# Good: Complete, executable example
from juddges.preprocessing.text_chunker import TextChunker
chunker = TextChunker(
id_col="judgment_id",
text_col="full_text",
chunk_size=512,
chunk_overlap=50
)
dataset = {
"judgment_id": ["doc1"],
"full_text": ["Sample text..."]
}
result = chunker(dataset)
Skipping Examples:
Use annotations to skip examples that can't be tested:
# doctest: +SKIP
# This requires live Weaviate instance
import weaviate
client = weaviate.Client("http://production:8080")
results = client.query.get(...)
Available annotations:
# doctest: +SKIP- Skip this example# requires: weaviate- Requires Weaviate (will use mock)# requires: gemini- Requires Gemini API (will use mock)# demonstration-only- Demonstration code, not executable
Each example is collected and validated by scripts/docs/test_code_examples.py.
Pre-commit Hooks¶
Pre-commit hooks automatically run on git commit:
# Run hooks on all files
pre-commit run --all-files
# Run specific hook
pre-commit run markdownlint-cli2 --all-files
pre-commit run cspell --all-files
CI/CD Pipeline¶
Workflows Overview¶
The documentation CI/CD consists of three main workflows:
1. Documentation Build & Deploy (docs-build-deploy.yaml)¶
- Trigger: Push to
main/masterbranch - Actions:
- Builds MkDocs documentation
- Deploys to GitHub Pages
- Caching: Pip dependencies and MkDocs build cache
2. Documentation Quality Checks (docs-quality-checks.yaml)¶
- Trigger: Pull requests and pushes to
main/master - Jobs:
- Markdown Linting: Validates markdown formatting
- Link Checking: Validates internal and external links
- Spell Checking: Checks spelling with custom dictionary
- Code Examples: Validates Python code blocks (syntax, imports, execution)
- Build Test: Tests documentation build in strict mode
- Parallel Execution: All jobs run in parallel for speed
- Artifacts: Generates
code-examples-report.jsonwith test results
3. Documentation PR Preview (docs-pr-preview.yaml)¶
- Trigger: Pull requests
- Actions:
- Builds documentation preview
- Generates change summary
- Posts comment on PR with statistics
Workflow Behavior¶
On Pull Request¶
- Quality checks run automatically
- Build preview generates change summary
- PR comment shows:
- Number of files changed
- New/modified/deleted files
- Build status
On Merge to Main¶
- Quality checks run (final validation)
- Documentation builds and deploys to GitHub Pages
- Live site updates automatically
Viewing Workflow Status¶
Check workflow status in:
- PR checks section
- GitHub Actions tab: https://github.com/laugustyniak/JuDDGES/actions
Workflow Caching¶
Workflows use caching to speed up builds:
- Pip dependencies: Cached based on
requirements.txtandpyproject.toml - MkDocs build: Cached based on commit SHA
- npm dependencies: Cached for spell checking
Best Practices¶
Documentation Writing¶
- Be Clear and Concise: Use simple language and short sentences
- Use Active Voice: "Install dependencies" not "Dependencies should be installed"
- Provide Examples: Include working code examples for all concepts
- Link Strategically: Link to related topics but avoid excessive linking
- Update Regularly: Update docs when code changes
Diátaxis Guidelines¶
Tutorials¶
- Start with prerequisites and objectives
- Use step-by-step numbered instructions
- Include expected output at each step
- End with what was learned and next steps
How-To Guides¶
- Start with the problem/goal
- Assume basic knowledge
- Focus on the solution, not explanation
- Include troubleshooting tips
Reference¶
- Be comprehensive and accurate
- Use consistent structure
- Include all parameters and return types
- Provide code examples
Explanation¶
- Discuss concepts and design decisions
- Explain the "why" not the "how"
- Compare alternatives
- Provide architectural context
Version Control¶
- One Topic Per PR: Keep documentation PRs focused
- Descriptive Commits: Use clear commit messages
- Review Changes: Preview locally before pushing
- Update Navigation: Add new pages to
mkdocs.yml
Accessibility¶
- Alt Text: Add descriptions for images
- Link Text: Use meaningful link text (not "click here")
- Headings: Use proper heading hierarchy
- Code Blocks: Always specify language for syntax highlighting
Troubleshooting¶
Common Issues¶
MkDocs Build Fails¶
Issue: mkdocs build fails with missing module errors
Solution:
# Reinstall dependencies
pip install -e .
pip install mkdocs-material mkdocstrings[python] pymdown-extensions
Link Checker Fails¶
Issue: Lychee reports broken internal links
Solution:
- Ensure linked files exist in the
docs/directory - Check file paths are relative and correct
- Verify navigation structure in
mkdocs.yml
Spell Checker False Positives¶
Issue: cspell reports valid technical terms as misspelled
Solution: Add terms to cspell.json under "words" array:
Markdown Lint Errors¶
Issue: markdownlint reports formatting errors
Solution:
# Auto-fix issues
markdownlint-cli2 "docs/**/*.md" --fix
# Or adjust rules in .markdownlint.json if needed
Pre-commit Hooks Fail¶
Issue: Pre-commit hooks fail on commit
Solution:
# Update hooks
pre-commit autoupdate
# Run manually to see detailed errors
pre-commit run --all-files
GitHub Pages Not Updating¶
Issue: Documentation doesn't update after merge
Solution:
- Check GitHub Actions for build errors
- Verify GitHub Pages is enabled in repository settings
- Ensure workflow has correct permissions
Getting Help¶
- Documentation Issues: Open an issue on GitHub
- Technical Questions: Ask in project discussions
- CI/CD Problems: Check GitHub Actions logs
Additional Resources¶
- MkDocs Documentation
- Material for MkDocs
- Diátaxis Framework
- Google Developer Documentation Style Guide
- Markdown Guide
- Mermaid Documentation
Summary¶
Contributing to documentation:
- Set up your local environment
- Write clear, structured documentation following Diátaxis
- Run quality checks locally
- Submit a PR and review automated feedback
- Address any issues and merge
Thank you for contributing to JuDDGES documentation!