HNSW Vector Index on unified_rag_content.embedding
Decision
Add an HNSW ANN (Approximate Nearest Neighbor) index to public.unified_rag_content(embedding) using vector_cosine_ops.
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_unified_rag_content_embedding
ON public.unified_rag_content
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
Context
The unified_rag_content table holds ~327K rows and is the backing store for both the SermonWise AI RAG pipeline and the ChurchWiseAI chatbot RAG retrieval path. The DBA-audit subagent (2026-05-05) confirmed that no ANN vector index existed — every cosine similarity search without a pericope ILIKE pre-filter was executing an O(N) exact KNN scan across all 327K embedding vectors.
At this scale, exact KNN scans take approximately 600–700ms p90. This is a silent performance drain: queries "work" but are an order of magnitude slower than they need to be, and the cost compounds as the table grows.
Why HNSW over IVFFlat
| Criterion | HNSW | IVFFlat |
|---|---|---|
| Training step required | No | Yes (CREATE INDEX requires a pre-built centroid list via VACUUM ANALYZE) |
| Recall @ ef_search=40 | ~99% | ~95–97% |
| Sensitivity to bulk inserts | Low (inserts re-wire graph incrementally) | High (bulk inserts degrade recall until VACUUM+ANALYZE re-clusters) |
| Build time for ~327K rows | ~5–15 min | Similar |
| Memory at query time | Higher (ef_search parameter trades RAM for recall) | Lower |
HNSW is the better choice for this workload because:
- No training step — index can be created live with
CONCURRENTLY - Better recall at the same query latency budget
unified_rag_contentreceives continuous incremental inserts (new church KB entries, sermon illustrations); IVFFlat recall degrades between VACUUM cycles, HNSW does not- pgvector HNSW support is stable as of pgvector ≥ 0.5.0
Build Parameters
| Parameter | Value | Rationale |
|---|---|---|
m | 16 | Connections per node. Standard for 300K-1M vectors. Higher = better recall, more RAM |
ef_construction | 64 | Build-time search depth. Standard for ~300K vectors. Higher = better graph quality, slower build |
ef_search | (default 40) | Can be tuned per-query via SET hnsw.ef_search = N if recall vs. latency tradeoff needs adjustment |
Expected Impact
| Metric | Before | After |
|---|---|---|
| RAG query p90 (unfiltered cosine search) | ~650ms | <50ms |
| RAG query p90 (with pericope pre-filter) | Already using btree on pericope; unchanged | |
| Table write throughput | Unchanged (index maintained incrementally) | |
| Table read throughput during build | Unchanged (CONCURRENTLY guarantee) |
Application Notes
Critical: CREATE INDEX CONCURRENTLY cannot run inside a transaction. The Supabase SQL Editor wraps all statements in a transaction by default — do NOT use the SQL Editor unless you uncheck "Wrap in transaction."
Preferred application method:
psql "postgresql://postgres.wrwkszmobuhvcfjipasi:<DB_PASSWORD>@aws-0-<region>.pooler.supabase.com:5432/postgres" \
-f churchwiseai-web/migrations/2026-05-05_unified_rag_hnsw_index.sql
The DB password is available in the Supabase dashboard under Project Settings → Database → Connection string.
Alternative: Supabase dashboard → SQL Editor → uncheck "Wrap in transaction" checkbox → paste the CREATE INDEX CONCURRENTLY ... statement only.
Monitor build progress:
SELECT * FROM pg_stat_progress_create_index;
Post-Build Verification
After the index appears in pg_indexes, verify the query planner uses it:
-- Get a real probe vector first:
SELECT embedding FROM unified_rag_content WHERE embedding IS NOT NULL LIMIT 1;
-- Then run EXPLAIN ANALYZE with that vector:
EXPLAIN (ANALYZE, BUFFERS)
SELECT id, embedding <=> '<probe-vector>'::vector AS dist
FROM unified_rag_content
ORDER BY dist
LIMIT 6;
-- Expected: "Index Scan using idx_unified_rag_content_embedding"
-- Expected runtime: <50ms
Migration File
churchwiseai-web/migrations/2026-05-05_unified_rag_hnsw_index.sql