Skip to main content

Knowledge > Processes > RAG Query Flow

RAG Query Flow

RAG (Retrieval-Augmented Generation) is how both the chatbot and voice agent ground their responses in real church data instead of making things up. The system searches two knowledge bases in parallel, formats the results, and injects them into the LLM's system prompt.


Two knowledge bases

Knowledge baseTable / RPCWhat it containsScope
Church KBsearch_church_knowledge RPCFAQs (canned responses) + uploaded document chunksPer-church
Unified RAGsearch_unified_rag_content RPC327K+ theological illustrations, sermon content, devotionalsShared across all churches, filtered by theological lens

Both use vector similarity search (cosine distance) against pre-computed embeddings.


Embedding generation

Both voice and chatbot use the same embedding model:

FUNCTION generateEmbedding(text):
1. Call OpenAI API: POST https://api.openai.com/v1/embeddings
Model: text-embedding-3-small
Dimensions: 1536 (must match the Supabase RPC vector columns)
2. IF API key not set or call fails:
RETURN null (RAG is skipped gracefully -- never crashes the conversation)
3. Extract and return the embedding vector (array of 1536 floats)

Theological lens selection

Every RAG query is filtered by a theological lens so a Baptist church gets Baptist-appropriate content and a Catholic church gets Catholic content.

FUNCTION get_lens_id(denomination):
1. Look up denomination string in DENOMINATION_TO_LENS mapping
Examples:
"Southern Baptist" -> lens 14 (Baptist)
"Roman Catholic" -> lens 7 (Catholic)
"Presbyterian" -> lens 4 (Reformed)
"Assembly of God" -> lens 9 (Pentecostal)
2. IF denomination is null or not found in mapping:
RETURN 10 (Christocentric -- the default/universal lens)

The mapping exists in three places that must stay in sync:

  • voice-agent-livekit/core/rag.py (Python, for voice agent)
  • src/lib/rag.ts (TypeScript, for chatbot)
  • Database: sai_theological_lenses table (18 lenses: IDs 1-17 + universal ID 0)

Chatbot RAG flow

The chatbot runs RAG on every message, in parallel with other data loading.

WHEN a user sends a chat message:

-- Phase 0: FAQ short-circuit (before RAG)
1. Call matchFAQ(message, churchId, agentType)
2. IF exact match found (exact_response = true):
RETURN the canned answer immediately (zero LLM cost)
Track as "canned" response source
3. IF fuzzy match found:
Store as faqPreferredContext (will be injected into prompt later)

-- Phase 1: Resolve theological lens
4. Priority order for lens selection:
a. Client-side override (demo mode -- explicit lens ID in request)
b. Church DB setting (church_theological_lenses table)
c. Denomination auto-detect (DENOMINATION_TO_LENS mapping)
d. Default: Christocentric (lens 10)

-- Phase 2: Load doctrinal rules
5. Query theological_contradictions table for this lens
Example for Baptist (lens 14):
- Baptism: "Believer's baptism only"
- MUST include: "personal decision", "profession of faith"
- MUST NEVER mention: "infant baptism", "christening"
6. Check for church-specific doctrinal overrides in organization_settings

-- Phase 3: Fetch lens vocabulary
7. Query preferred and avoided vocabulary for this theological tradition
Builds a block like: "USE these terms: ..., AVOID these terms: ..."

-- Phase 4: Parallel RAG search
8. Generate embedding for the user's message (one call, reused for both searches)
9. IF embedding generation succeeds:
Run BOTH searches in parallel:
a. searchRAG(embedding, lensIds=[lensId], matchCount=8)
-> queries unified_rag_content via RPC
-> returns up to 8 results filtered by theological lens
b. searchChurchKnowledge(churchId, embedding)
-> queries church_knowledge_base via RPC
-> returns church-specific FAQ and document matches
10. Format results:
- formatRAGContext(ragResults)
Produces: "--- Curated Knowledge Base --- [1] "Title" (content_type) Snippet... --- End ---"
- formatChurchKnowledgeContext(churchResults)
Produces: "--- Church Knowledge Base --- [1] "Title" (FAQ) Snippet... --- End ---"
- Each snippet is truncated to 600 characters

-- Phase 5: Load product knowledge
11. Query product_knowledge table (all active rows, ordered by priority DESC)
Format as Q&A pairs for billing/platform questions

-- Phase 6: Build system prompt
12. Assemble the full system prompt from:
- Church facts (name, address, denomination, hours, staff, ministries, what-to-expect)
- Doctrinal rules block
- Lens vocabulary block
- Church knowledge base context (PRIORITIZED -- "use this over general knowledge")
- Theological RAG context
- FAQ preferred context (fuzzy match, if any)
- Product knowledge block
- Agent personality and instructions

-- Phase 7: Call LLM
13. Send conversation history + system prompt to the LLM
Primary: Anthropic Claude Haiku 4.5
Fallback: OpenAI gpt-4o-mini

Voice agent RAG flow

The voice agent uses RAG in two phases: a broad "session init" search when a call starts, and focused "per-turn" searches during the conversation.

Session-init RAG (runs once when a call connects)

FUNCTION fetch_session_rag(supabase, church_id, denomination, church_name):

1. Build a broad seed query:
"Tell me about {church_name}, services, events, programs, ministries"
2. Generate embedding for the seed query
3. IF embedding fails: RETURN empty string (no RAG for this session)

4. Get lens_id from denomination

5. Run BOTH searches in parallel (asyncio.gather):
a. search_church_knowledge(church_id, embedding, match_count=8, threshold=0.35)
b. search_unified_rag(embedding, lens_ids=[lens_id], match_count=5, threshold=0.35)

6. Format both result sets:
- format_church_knowledge_context(church_results)
- format_rag_context(rag_results)
Each snippet truncated to 600 chars

7. RETURN combined context string (both blocks concatenated)
This gets injected into the voice agent's system prompt at call start

The session-init RAG gives the voice agent a broad base of knowledge about the church so it can handle common questions without per-turn latency.

Per-turn RAG (runs on each caller utterance)

FUNCTION fetch_turn_rag(supabase, church_id, caller_message):

1. IF caller_message is shorter than 10 characters: SKIP (not enough signal)

2. Start a 500ms hard timeout (asyncio.wait_for)

3. Generate embedding for the caller's actual words
4. IF embedding fails: RETURN empty string

5. Search church KB ONLY (not theological -- keeps latency low):
search_church_knowledge(church_id, embedding, match_count=5, threshold=0.40)
Note: higher threshold (0.40 vs 0.35) for stricter relevance

6. Format results and RETURN

7. IF timeout (500ms exceeded):
LOG warning
RETURN empty string (the call continues without RAG for this turn)

8. IF any other error:
LOG error
RETURN empty string (never crash the call)

Key design decisions for per-turn RAG:

  • Church KB only (no theological search) -- keeps response time under the perception threshold
  • 500ms hard timeout -- if Supabase is slow, RAG is silently skipped
  • Higher match threshold (0.40 vs 0.35) -- prioritizes precision over recall since there's no time to process low-quality results
  • 10-character minimum -- avoids wasting embeddings on "okay", "yes", "thanks"
  • Never raises exceptions -- the voice call must continue no matter what

Supabase RPC functions

search_church_knowledge

RPC: search_church_knowledge(p_church_id, p_query_embedding, p_match_threshold, p_match_count)

Searches the church_knowledge_base (FAQs + uploaded document chunks) for a specific church.
Returns rows with: title, content, source ("faq" or "document"), similarity score
Filtered by: church_id match AND cosine similarity >= threshold
Ordered by: similarity DESC
Limited to: match_count rows

search_unified_rag_content

RPC: search_unified_rag_content(query_embedding, p_content_categories, p_content_types,
p_theological_lens_ids, p_canonical_pericope_id,
p_pericope_text, p_organization_id,
p_match_threshold, p_match_count)

Searches the unified_rag_content table (327K+ rows of theological content).
Returns rows with: title, content, content_type, content_category, similarity score
Filtered by: theological lens AND cosine similarity >= threshold
Most filter params are nullable (pass null to skip that filter)
Ordered by: similarity DESC
Limited to: match_count rows

Context formatting

Both voice and chatbot format RAG results the same way:

Church KB format:
--- Church Knowledge Base ---
[1] "What time are Sunday services?" (FAQ)
Sunday services are at 9:00 AM and 11:00 AM. Our contemporary...

[2] "Building Access Policy" (Document)
The church building is open Monday through Friday from 8:00 AM...
--- End Church Knowledge Base ---

Theological RAG format:
--- Curated Theological Content ---
[1] "The Shepherd's Heart" (sermon_illustration)
In a small village in Scotland, a pastor named...

[2] "Understanding Grace" (theological_reflection)
The Reformed tradition teaches that grace is not merely...
--- End Content ---

Each snippet is hard-capped at 600 characters to keep prompt size manageable.


System prompt assembly order (chatbot)

The final system prompt is assembled from these blocks in order:

  1. Agent personality and base instructions
  2. Church facts block (name, address, hours, staff, ministries, what-to-expect)
  3. Doctrinal rules block ("MUST follow Baptist positions on baptism...")
  4. Lens vocabulary block ("USE: covenant, AVOID: karma")
  5. Church knowledge base context (PRIORITIZED over general content)
  6. Theological RAG context
  7. FAQ preferred context (fuzzy match from canned responses)
  8. Product knowledge block (billing, platform features)

The church's own KB content is explicitly labeled as higher priority than general theological content, so the LLM prefers church-specific answers when both are available.