Populate Search Topics in Production Meilisearch¶
This runbook generates and publishes the topics Meilisearch index that powers
the search-bar autocomplete chips. The pipeline is offline, idempotent, and
performs a zero-downtime atomic swap, so it is safe to re-run.
As of 2026-05-13, generated topics are first persisted to the Supabase
search_topics table. Meilisearch is then rebuilt from that Supabase snapshot,
so Supabase is the durable ground truth for topic reindexing.
When to run: initial prod bootstrap; periodic refresh as the corpus grows (e.g. quarterly, or after a large ingestion batch); when autocomplete stops returning chips and the cause is traced to an empty
topicsindex.
TL;DR¶
# On the prod host, from the deploy directory:
docker compose run --rm \
-e MEILISEARCH_INTERNAL_URL=http://meilisearch:7700 \
backend python -m scripts.generate_search_topics
Confirm y at the swap prompt. Total wall-clock: ~10–20 min (most of it
BERTopic + embeddings).
1. Prerequisites¶
Access¶
- SSH access to the prod host with permission to run
docker compose. - The prod
.env(or compose env-file) must already provide:
| Variable | Used for |
|---|---|
SUPABASE_URL |
Pull criminal judgments |
SUPABASE_SERVICE_ROLE_KEY |
Pull criminal judgments (read-all) |
OPENAI_API_KEY |
GPT-4o-mini labelling + cross-lingual alignment |
MEILISEARCH_INTERNAL_URL or MEILISEARCH_URL |
Reach the meilisearch container |
MEILISEARCH_ADMIN_KEY or MEILI_MASTER_KEY |
Create/swap indexes |
The script falls back from ADMIN_KEY → MEILI_MASTER_KEY and from
INTERNAL_URL → URL automatically
(scripts/generate_search_topics.py:789-808).
Cost / resource budget¶
- OpenAI: ~150 GPT-4o-mini calls (75 cluster-label calls per jurisdiction
- 1 alignment call). Single-digit USD.
- CPU/RAM: BERTopic + Sentence-Transformers run inside the backend container. Plan for ~4 GB peak RAM and a few minutes of sustained CPU. Do not run during peak user traffic on a small host.
- Meilisearch: writes ≤ 500 documents to a staging index, then swaps. No
effect on the live
judgmentsindex.
Pre-flight check¶
Confirm the current state before touching anything:
# Are services up?
docker compose ps meilisearch backend
# Current topics index document count (expect 0 on first run)
docker exec juddges-meilisearch curl -s \
-H "Authorization: Bearer $MEILI_MASTER_KEY" \
http://localhost:7700/indexes/topics/stats
2. Run the pipeline¶
From the prod-host deploy directory (the one containing the active
docker-compose.yml):
docker compose run --rm \
-e MEILISEARCH_INTERNAL_URL=http://meilisearch:7700 \
backend python -m scripts.generate_search_topics
Flags worth knowing (all optional):
| Flag | Default | When to change |
|---|---|---|
--jurisdictions |
pl,uk |
Single-jurisdiction debug runs |
--case-type |
criminal |
Future expansion to civil etc. |
--sample-per-jurisdiction |
6000 |
Smaller value for a cheap smoke test |
--max-concepts |
500 |
Hard cap; matches Meili maxTotalHits |
--dry-run |
off | Generate JSON only, skip the Meili push |
Use --dry-run first if you want to inspect
frontend/lib/stats/search-topics.json before publishing.
3. What you will see¶
The script is interactive and Rich-formatted. Expect these phases:
- Pull from Supabase — ~6 000 PL + ~6 000 UK criminal judgments.
- Cluster per jurisdiction — BERTopic targets 75 clusters each.
- Label clusters — 75 + 75 GPT-4o-mini calls (structured output).
- Cross-lingual alignment — one GPT-4o-mini call → up to 500 unified concepts.
- Write JSON to
frontend/lib/stats/search-topics.json(inside the container — this file is also committed to git from local dev runs). - Persist to Supabase in
search_topicsas a snapshot run. - Atomic Meili swap from Supabase:
- Creates staging index
topics_newwith full settings. - Loads the persisted topic run from Supabase.
- Upserts the 500 concepts.
- Diffs staging vs live
topics(added / removed / shifteddoc_count). - Prompts:
Swap topics_new → topics? [y/N]. - On
y:swapIndexes→ delete leftover staging → done. - On
n: leavestopics_newin place; cleanup hint is printed.
Implementation: push_to_meilisearch in
scripts/generate_search_topics.py:773.
4. Verify¶
After the swap completes:
# Document count (expect ≤ 500)
docker exec juddges-meilisearch curl -s \
-H "Authorization: Bearer $MEILI_MASTER_KEY" \
http://localhost:7700/indexes/topics/stats
# Sample a real query as the autocomplete endpoint would
docker exec juddges-meilisearch curl -s -X POST \
-H "Authorization: Bearer $MEILI_MASTER_KEY" \
-H "Content-Type: application/json" \
-d '{"q":"narko","limit":5,"attributesToRetrieve":["id","label_pl","label_en","category","doc_count"]}' \
http://localhost:7700/indexes/topics/search
# End-to-end via the backend API (replace HOST):
curl -s "https://<prod-host>/api/v1/search/autocomplete?q=fraud&limit=5" \
-H "Authorization: Bearer $BACKEND_API_KEY"
Finally, open the prod site, click the search bar, and confirm that topic chips appear under the input.
If you only need to rebuild Meilisearch from the last persisted Supabase snapshot, without regenerating topics, run:
docker compose run --rm \
-e MEILISEARCH_INTERNAL_URL=http://meilisearch:7700 \
backend python scripts/sync_meilisearch_topics.py
5. Failure modes & recovery¶
| Symptom | Likely cause | Fix |
|---|---|---|
Meilisearch admin key not configured — cannot push topics. |
Neither MEILISEARCH_ADMIN_KEY nor MEILI_MASTER_KEY reaches the container |
Pass it via -e or fix the env-file, then re-run |
| Pipeline aborted before swap | Staging index topics_new left behind |
Re-run; the script deletes leftover staging on startup (generate_search_topics.py:820) |
task ... did not succeed (status=failed) during upsert |
Schema drift between code and live topics settings |
See Meili settings atomic-apply caveat — apply settings out-of-band, then re-run |
User clicked n at the prompt |
topics_new retained, live topics untouched |
Re-run when ready; or delete staging: curl -X DELETE -H "Authorization: Bearer $MEILI_MASTER_KEY" http://localhost:7700/indexes/topics_new |
| Autocomplete still empty after swap | Backend cached the topics service before the swap | The service is process-cached (search.py: _topics_service). Restart backend: docker compose restart backend |
| GPT-4o-mini rate-limit / timeout | OpenAI transient error | Re-run; cluster labelling is per-cluster and the alignment call is a single request — both safe to retry from scratch |
Rolling back after a successful swap is intentionally not automated: just run the pipeline again with the previous corpus, or restore Meili from its volume snapshot if a deeper issue surfaces.
6. Related¶
scripts/generate_search_topics.py— pipeline source.scripts/sync_meilisearch_topics.py— re-publish the latest persistedsearch_topicssnapshot to Meilisearch without regenerating topics.backend/app/services/meilisearch_config.py:269—topicsindex settings (searchable/filterable/sortable attributes,maxTotalHits: 500).backend/app/services/search_topics_store.py— Supabase persistence helpers forsearch_topics.backend/app/services/search.py(MeiliSearchService.topics_from_env,autocomplete) — the consumer side.backend/app/api/search.py— HTTP route that surfaces topic chips.docs/how-to/deployment.md— broader prod deploy context.- Known caveat: a separate startup PATCH on the
judgmentsindex settings fails atomically due to thebge-m3embedders block; this does not affect thetopicsindex, whose settings call succeeds independently.