Skip to content

Git LFS Setup for Large Model Files

Files to Track

  • models/umap/umap_model_LegalDocuments.pkl (477MB)
  • umap_coords/LegalDocuments_coords.parquet (13MB)

Setup (Run on HOST machine, not in Docker)

1. Install Git LFS

# Ubuntu/Debian
sudo apt-get install git-lfs

# macOS
brew install git-lfs

2. Initialize and Add Files

# Initialize Git LFS
git lfs install

# Verify .gitattributes has LFS rules
cat .gitattributes
# Should show:
# *.pkl filter=lfs diff=lfs merge=lfs -text
# *.parquet filter=lfs diff=lfs merge=lfs -text

# Add files
git add models/umap/umap_model_LegalDocuments.pkl
git add umap_coords/LegalDocuments_coords.parquet

# Verify LFS tracking
git lfs ls-files

# Commit
git commit -m "Add UMAP model and coordinates to LFS"

Troubleshooting

Files not tracking with LFS

# Re-track
git rm --cached models/umap/umap_model_LegalDocuments.pkl
git add models/umap/umap_model_LegalDocuments.pkl
git lfs ls-files  # Verify

Storage Limits

  • GitHub: 1GB free storage + 1GB/month bandwidth
  • GitLab: 10GB per repository
  • Consider DVC for larger models