Knowledge > Products > Voice Agent > Call Lifecycle

Voice Agent Call Lifecycle

This document describes what happens from the moment a phone rings to the moment the call ends. Written as pseudocode readable by a non-developer.

Phase 1: Call Arrives

1. Phone rings
   Someone calls a church phone number (or the toll-free sales line).

2. Twilio receives the call
   Twilio is the telephony provider. It knows which number was called
   and who is calling (caller ID).

3. Twilio forwards the call via SIP trunk to LiveKit Cloud
   The Twilio number is configured with a SIP trunk that forwards all
   calls to the LiveKit Cloud SIP gateway (project: cwa-voice-9x077mph).
   LiveKit Cloud dispatches a job to the Railway agent worker and manages
   the audio room.

4. The LiveKit agent worker receives the job via JobContext
   The entry point in main.py receives a JobContext object containing:
     - room.sip.trunkPhoneNumber = the number that was called
     - room.sip.from_            = the caller's phone number
     - room.name                 = a unique LiveKit room identifier for this call

Phase 2: Route the Call

5. resolve_route(to_number) determines the agent type
   The system checks the called number against three lists:

   a. TOLL_FREE_NUMBER (+18886030316)
      Result: ("sales", None) -- route to the Sales Agent

   b. DEMO_NUMBERS (set of demo line numbers)
      Result: ("demo_router", None) -- route to the Demo Router
      (lets callers choose which demo church to experience)

   c. PHONE_REGISTRY (static map of Twilio numbers to church IDs)
      Result: ("church", church_id) -- route to a specific church

   d. Not found in any list
      Result: ("church", None) -- needs a database lookup

Phase 3: Load Church Data (Church Calls Only)

6. If church_id is unknown, look it up in the database
   Query church_voice_agents where twilio_phone_number matches.
   Result cached for 5 minutes. If still not found, fall back to Sales Agent.

7. Load church data from Supabase (load_church_data)
   Three parallel queries:
     a. church_voice_agents joined with churches table
        (name, address, denomination, pastor name, voice config,
         feature toggles, giving settings, integration keys)
     b. organization_settings (chatbot agent config)
     c. premium_churches (plan tier, call limits)

   These are assembled into a single "church" dict with ~50 fields.

8. Check call limit
   Compare calls_this_month against calls_limit.
   If the church has used all their calls this month, reject the call
   (return None, which causes a fallback to the Sales Agent).

9. Insert call log -- initial database row
   Write to voice_call_logs with:
     - call_id (from LiveKit SIP attributes)
     - church_id
     - from_number (caller phone)
     - to_number (church phone)
     - status: "in_progress"
   This creates the record that will be updated throughout the call.

10. Increment call count
    Bump calls_this_month on church_voice_agents by 1.
    Non-fatal: if this fails, the church gets a free call rather than
    a dropped call.

Phase 4: Load Context (Parallel)

11. Load contextual data in parallel (all run at the same time):

    a. Session RAG (fetch_session_rag)
       Generate an embedding for a broad seed query:
         "Tell me about [Church Name], services, events, programs, ministries"
       Search two sources in parallel:
         - Church knowledge base (FAQs + uploaded documents) -- up to 8 results
         - Unified theological content (filtered by denomination lens) -- up to 5 results
       Format results into prompt blocks.

    b. Product knowledge (load_product_knowledge)
       Load all active rows from product_knowledge table, ordered by priority.
       Formatted as Q&A pairs. Cached for 15 minutes.

    c. Inline FAQs (load_inline_faqs)
       Load church-specific FAQ pairs from church_knowledge_base table.
       These are SEPARATE from RAG vector search -- injected verbatim.
       Cached for 5 minutes.

    d. Repeat caller history (load_repeat_caller_history)
       Query the last 5 calls within 90 days from the same phone number
       to the same church. Extract summaries.
       Result is privacy-gated: agent is told "Do NOT mention these
       unless the caller brings them up first."

    e. Datetime context (build_datetime_context)
       Current date, time, timezone for the church.
       Includes relative references: "This Sunday means March 30, 2026."
       Helps the agent correctly interpret "this Sunday" or "next week."

12. Combine all context blocks
    Join RAG, product knowledge, FAQs, repeat history, and datetime
    into one large context string, separated by double newlines.

Phase 5: Build the Agent

13. Build Coordinator Agent
    The Coordinator is the front-door agent for all church calls.
    It receives:
      - The church dict (50+ config fields)
      - The combined RAG context
    It is configured with:
      - LLM: Gemini 2.5 Flash (COORDINATOR_MODEL)
      - Tools: send_sms_link, end_call, plus feature-gated tools
        (capture_visitor_contact, send_directions_link, register_for_event,
         check_availability, book_appointment, PCO tools, send_giving_link)
      - transfer_to_care: handoff tool that routes to the Care Agent
        (Care Agent uses Claude Haiku 4.5 for better empathy)
      - Introduction greeting:
        "Thank you for calling [Church Name]. I'm an AI assistant
         and this call may be recorded. How can I help you today?"

14. Attach call context and start session
    Call metadata (call_id, caller_phone, church_data, supabase) is attached
    to the agent as _call_context so tools can access it.
    Per-turn moderation, noise filtering, RAG injection, and farewell detection
    run via on_user_turn_completed() callback on each Agent class.
    The agent is now ready. See voice-turn-processing.md for the pipeline.

Phase 6: Conversation Loop (Per Turn)

15. For each turn of the conversation:

    a. Caller speaks
       Audio goes to Deepgram STT (speech-to-text via livekit-plugins-deepgram).
       Result: a text string of what the caller said.

    b. Cancel any pending farewell timer
       If the caller speaks again after a goodbye, the auto-hangup
       is cancelled.

    c. "Are you there?" reassurance check
       If the agent is currently processing (LLM or tool call in-flight)
       and the caller says "are you there?" or "hello?":
         -> Reply immediately: "Yes, I'm here! Just one more moment."
         -> Do NOT cancel the pending work. Do NOT forward to LLM.

    d. Moderation checks (BEFORE noise filtering -- safety first)

       THREAT CHECK:
         If the caller makes violent threats against others
         (e.g., "I'm going to shoot up the church"):
           -> Hardcoded response (bypasses LLM entirely):
              "I need to stop you right there. This call is being
               recorded and logged. I'm ending this call now.
               If you or someone else is in danger, please call
               nine one one."
           -> Log the violation to moderation_violations table
           -> Send alert emails and SMS to church + support
           -> Wait 4 seconds for the message to play, then hang up
           -> RETURN (call is over)

       CRISIS CHECK:
         If the caller expresses suicidal ideation, self-harm, or
         coded crisis language (e.g., "I just can't do this anymore",
         "I'm tired of living", "ready to meet my maker"):
           -> Log the violation
           -> Send crisis alert emails and SMS
           -> Inject context directive into the LLM:
              "CRITICAL: Caller may be in crisis. Provide the 988
               Suicide and Crisis Lifeline immediately."
           -> Set crisis_detected = true (disables auto-hangup)
           -> Continue to LLM (agent handles with injected directive)

       ABUSE CHECK:
         If the caller uses profanity or hostile language:
           First offense: inject "The caller used inappropriate language.
                          Respond calmly and redirect." into LLM context
           Second offense: hardcoded response:
              "I'm going to end this call now. Have a good day."
              -> Wait 2 seconds, then hang up
              -> RETURN (call is over)

    e. Noise filtering (AFTER moderation -- only if moderation did not fire)
       Check if the utterance is pure noise ("um", "uh huh", "mmm"):
         -> Drop silently. No LLM call.
       Check if it's context-dependent ("okay", "yeah", "sure"):
         -> If agent asked a question: pass through (it's a valid answer)
         -> If agent did NOT ask a question: drop silently

    f. Per-turn RAG (500ms hard timeout)
       If the caller said something meaningful (10+ characters) and
       moderation did not fire:
         -> Generate embedding for the caller's message
         -> Search church knowledge base (stricter threshold: 0.4)
         -> If Supabase is slow (>500ms), skip RAG for this turn
            (the call continues without extra context)

    g. Combine contexts
       Merge any moderation directive + per-turn RAG into a combined
       context string.

    h. Process through LLM
       Send the caller's message + combined context to the LlmAgent.
       The LLM generates a response, potentially calling tools.

       If a tool is called (e.g., submit_prayer_request):
         -> Send a filler phrase first: "One moment." or
            "Let me check on that." (randomized, never boring)
         -> Execute the tool
         -> LLM processes the tool result and generates a response
         -> Tools marked as "background" (prayer, callback) survive
            barge-in and yield intermediate messages.

    i. Response plays as audio
       The LLM's text response goes to Cartesia Sonic TTS
       (text-to-speech via livekit-plugins-cartesia) and plays as
       natural-sounding audio to the caller.

    j. Track the agent's response
       Record whether the agent asked a question (contains "?")
       for noise filtering on the next turn.

    k. Mutual farewell detection (disabled during crisis)
       After the LLM responds, check if BOTH the agent and the
       caller said goodbye:
         Agent farewell phrases: "take care", "have a blessed",
           "god bless", "goodbye", etc.
         Caller farewell phrases: "bye", "good night", "that's all",
           "thank you" (short), etc.
       If mutual farewell detected:
         -> Set farewell_pending = true
         -> Wait 4 seconds (grace period for farewell audio to finish)
         -> If caller hasn't spoken again: auto-hangup
         -> If caller spoke: cancel the hangup (they had more to say)

Phase 7: Call Ends

16. Call ends (CallEnded event received)

    a. Extract transcript from agent history
       Convert all events (UserTextSent, AgentTextSent, tool calls)
       into a JSON array.

    b. Update call log (update_call_log_end)
       Write to voice_call_logs:
         - status: "completed"
         - duration_seconds: elapsed time since call start
         - transcript: the full JSON transcript

    c. Async classification via Gemini 2.5 Flash
       Send the transcript text to Gemini Flash with a structured
       prompt requesting exactly 7 fields:

       SUMMARY:   1-2 sentence factual summary of the call
       SENTIMENT: -1.0 (very distressed) to 1.0 (very positive)
       TOPICS:    comma-separated list (prayer, visitor, giving, etc.)
       CATEGORY:  single primary category (prayer_request, visitor, etc.)
       URGENCY:   low | normal | urgent | pastoral_emergency
       FOLLOW_UP: true | false (should a staff member review?)
       ASSIGNEE:  pastor | office_admin | prayer_team | care_team |
                  volunteer_coordinator | finance_team | none

    d. Update call log with classification
       Write the parsed classification fields back to voice_call_logs.
       This powers the admin dashboard's call list with AI-generated
       summaries, urgency flags, and suggested assignees.

Key Design Principles

Non-fatal everything: Every Supabase call, every notification, every classification is wrapped in try/except. A database hiccup never drops a live call.
Parallel loading: Session init loads RAG, product knowledge, FAQs, repeat history, and datetime all at the same time. This keeps call setup under 2 seconds.
Cache-first with stale fallback: Church data is cached for 5 minutes. If Supabase errors during a cache refresh, the stale cached data is served rather than failing the call.
Safety before convenience: Moderation checks run before noise filtering. A crisis utterance is never silently dropped, even if it matches noise patterns.
Grace periods: Farewell auto-hangup waits 4 seconds. Threat response waits 4 seconds. Abuse response waits 2 seconds. These let the audio finish playing before the line drops.

Phase 1: Call Arrives​

Phase 2: Route the Call​

Phase 3: Load Church Data (Church Calls Only)​

Phase 4: Load Context (Parallel)​

Phase 5: Build the Agent​

Phase 6: Conversation Loop (Per Turn)​

Phase 7: Call Ends​

Key Design Principles​

Phase 1: Call Arrives

Phase 2: Route the Call

Phase 3: Load Church Data (Church Calls Only)

Phase 4: Load Context (Parallel)

Phase 5: Build the Agent

Phase 6: Conversation Loop (Per Turn)

Phase 7: Call Ends

Key Design Principles