Skip to main content

Knowledge > Products > Voice Agent > Fallback Strategy

Voice Agent Fallback Strategy

This document covers resilience mechanisms for the voice agent: what happens when a provider fails, how routing falls back, and what's planned for future hardening.


Currently Implemented Fallbacks

1. Routing Fallback — Unknown Church

If a call comes in on a number that isn't in PHONE_REGISTRY (static map) AND isn't found in church_voice_agents (DB lookup), the agent falls back to the Sales Agent rather than dropping the call.

Unknown number
→ DB lookup: church_voice_agents WHERE twilio_phone_number = dialed
→ Not found → Fall back to Sales Agent (never drop the call)

Similarly, if a church's data fails to load (DB error or call limit exceeded):

load_church_data() returns None
→ Fall back to Sales Agent

Where it lives: main.py_build_church_path()


2. Cache Stale Fallback — Supabase Errors

Church data and product knowledge are cached in-memory with TTLs. If Supabase throws an error during a cache refresh, the stale cached value is served rather than failing the call.

DataTTLStale behavior
Church data5 minServe stale on error
Product knowledge15 minServe stale on error
Inline FAQs5 minServe stale on error
Church DB lookup5 minServe stale on error

Where it lives: session.pycache_get() / cache_set()


3. Per-Turn RAG Timeout

If the Supabase vector search for per-turn RAG takes longer than 500ms, it is skipped entirely for that turn. The call continues without extra context rather than blocking the caller.

fetch_turn_rag()
→ asyncio.wait_for(..., timeout=0.5)
→ TimeoutError → return "" (empty RAG context)
→ Call continues normally

Where it lives: core/rag.pyfetch_turn_rag()


4. Non-Fatal Everything

All Supabase writes (prayer requests, callback requests, visitor contacts, call log updates, moderation violations) are wrapped in try/except. A DB write failure never drops a live call. The log entry is lost in the worst case, but the caller's conversation continues uninterrupted.


Provider Configuration (Current)

ComponentPrimary ProviderSecondaryNotes
LLM — Coordinatorgoogle/gemini-2.5-flashNone configuredBoth Google and Anthropic plugins installed
LLM — Care Agentanthropic/claude-haiku-4-5-20251001None configuredHaiku chosen for empathy, lower temperature
LLM — Sales Agentgoogle/gemini-2.5-flashNone configuredSame as Coordinator
STTdeepgram/nova-3None configuredlivekit-plugins-deepgram~=1.5
TTScartesia/sonic-3:{voice_id}None configuredlivekit-plugins-cartesia~=1.5
VADSilero (pre-warmed)NonePre-loaded once per worker process
SIP ProviderTwilioTelnyx (planned)See Telnyx Migration section

Note: The LiveKit Agents SDK (livekit-agents~=1.5) supports FallbackSTT, FallbackTTS, and FallbackLLM plugins, but these are not yet configured in the current codebase.


Planned Fallbacks (Not Yet Implemented)

LLM Fallback

The intention (reflected in existing docs) is:

  • Coordinator: Gemini 2.5 Flash primary → Claude Haiku 4.5 fallback
  • Care Agent: Claude Haiku 4.5 primary → Gemini 2.5 Flash fallback

This would be implemented using LiveKit's FallbackLLM plugin:

from livekit.plugins import fallback

llm = fallback.FallbackLLM(
primary=google.LLM(model="gemini-2.5-flash"),
fallback=anthropic.LLM(model="claude-haiku-4-5-20251001"),
)

Priority: Implement before scaling beyond ~20 concurrent churches. A Gemini outage currently takes down ALL Coordinator agents simultaneously.

STT Fallback

  • Primary: Deepgram Nova-3 (most reliable for phone audio)
  • Planned fallback: A second Deepgram model or alternative provider
from livekit.plugins import fallback

stt = fallback.FallbackSTT(
deepgram.STT(model="nova-3"),
deepgram.STT(model="nova-2"), # older but battle-tested
)

TTS Fallback

  • Primary: Cartesia Sonic-3 (highest quality voice)
  • Planned fallback: Cartesia Sonic-2 or ElevenLabs

Cartesia had a 5-day outage in early 2026 (see project memory: project_cartesia_vendor_risk.md) which would have taken down all TTS if the system had been live. A TTS fallback is especially important because TTS failure is immediately audible to the caller.

stt = fallback.FallbackTTS(
cartesia.TTS(model="sonic-3", voice=voice_id),
cartesia.TTS(model="sonic-2", voice=fallback_voice_id),
)

Telnyx Migration (Planned)

Why Telnyx

When churches start signing up at scale, Twilio will be replaced by Telnyx as the SIP trunk provider for church phone numbers. Telnyx offers:

  • Lower per-minute cost at volume
  • Elastic SIP trunking (no fixed trunk capacity)
  • Built-in number management API (enables auto-provisioning)
  • Same SIP INVITE format as Twilio — LiveKit sees no difference

What Changes

The voice agent itself requires no code changes. LiveKit Cloud receives SIP calls identically from Twilio or Telnyx. What changes:

  1. Number provisioning: Buy numbers via Telnyx API instead of Twilio dashboard
  2. SIP trunk config: Point Telnyx trunk to cwa-voice-9x077mph.livekit.cloud SIP endpoint
  3. session.py PHONE_REGISTRY: Telnyx numbers are added the same way as Twilio numbers
  4. church_voice_agents table: twilio_phone_number column stores the E.164 number (Twilio or Telnyx — the column name is a legacy label)

What Stays Twilio

Demo lines and sales numbers are currently Twilio numbers. These may or may not migrate:

  • Toll-free (+18886030316) — may stay Twilio (toll-free porting is complex)
  • Demo lines (+14696152221, +13658254095) — candidate for Telnyx

Migration Path (When Ready)

  1. Set up Telnyx account and elastic SIP trunk
  2. Configure trunk to forward calls to LiveKit Cloud SIP endpoint
  3. For each new church signup: provision number via Telnyx API, update church_voice_agents
  4. For existing Twilio numbers: port when contracts allow
  5. No voice agent code changes required

Current State

Telnyx is not yet set up. All numbers are Twilio. The main.py docstring already acknowledges Telnyx as a planned path:

# Phone call → Twilio/Telnyx SIP trunk → LiveKit Cloud SIP gateway

Outage Response

See runbooks/voice-ops/cartesia-outage.md for the runbook covering LiveKit Cloud, Cartesia TTS, and Railway worker outages.

Quick Reference

FailureImpactResponse
LiveKit Cloud SIP downAll calls dropCheck LiveKit status; calls fail silently to caller
Deepgram STT downSTT fails; agents can't hear callersNo in-code fallback yet; monitor Deepgram status
Cartesia TTS downNo audio outputNo in-code fallback yet; monitor Cartesia status
Google Gemini downCoordinator/Sales agents failNo in-code fallback yet; Care Agent (Haiku) still works
Anthropic Claude downCare Agent failsNo in-code fallback yet; Coordinator still works
Railway worker downNew calls not answeredRedeploy Railway; existing calls may have already dropped
Supabase downChurch data load fails; calls fall back to Sales AgentStale cache serves up to 15 min; then graceful degradation
Twilio SIP downNo new inbound callsNo fallback until Telnyx migration complete