Unify chatbot traffic through /api/chatbot/stream
Status
DECIDED
Context
Three separate chatbot endpoints existed across codebases:
/api/chatbot/stream(churchwiseai-web) — Vercel AI SDK 6, SSE, being actively developed/api/chatbot/chat(churchwiseai-web) — legacy route with Sonnet escalation paths/api/chatbot/chat(pewsearch) — PewSearch-specific chatbot route
The Sonnet escalation path in the legacy /chat route was identified as the
root cause of $10–30/day in surprise API bills during QA runs. Separately, a
cost audit found that the Vercel AI Gateway was adding markup on every LLM call —
the business was effectively paying twice for inference.
Additionally, the pro_website chatbot had separate limits (150 convos/mo for
Premium Listing, 200 convos/mo for Pro Website) that were not enforced at a
unified layer.
Decision
- Unify all chatbot traffic through
/api/chatbot/streamwith tier-gated toolsets. This is the single production chatbot endpoint. - Remove the Vercel AI Gateway — switch to direct
@ai-sdk/anthropicclient. Gateway added per-call markup with no offsetting benefit. - Remove the Sonnet escalation path entirely. Claude Haiku 4.5 handles all
turns including pastoral care. Care library becomes unconditional (was gated
on
escalate=true). - Tier-gated tool allocation: Premium Listing ($9.95) → 5 basic tools, 150 convos/mo, 1 agent. Pro Website ($19.95) → 12 starter tools, 200 convos/mo, 2 agents. CWA tiers → 12/35/39 tools per tier.
- PewSearch's own
/api/chatbot/chatendpoint deprecated; Premium Listing chatbot embed switches to the CWA widget.
Rationale
- Cost: Removing Sonnet escalation eliminates the $10–30/day API spike. Removing the gateway eliminates double-billing. Haiku 4.5 achieves 0.96–0.98 empathy scores — Sonnet provides no measurable quality lift for this use case.
- Simplicity: One endpoint, one system prompt architecture, one place to
debug. Prior dual-endpoint state meant a bug fix in
/chatnever reached/streamand vice versa. - Speed as empathy: Haiku's lower latency is itself a UX improvement — faster first token means the pastor's visitor feels heard sooner.
Consequences
- Good: Single endpoint to maintain, test, and monitor. Cost normalized. Prompt caching on the ~9,700-token static system prompt achieves 90% discount on cached tokens ($0.008/M vs $0.80/M).
- Bad:
escalation.tsis now dead code (kept in place, imports removed). Any future need for tiered model routing must be re-architected from scratch. - Reversible? Escalation path can be reintroduced by re-importing
escalation.tsand re-adding the gating condition. Gateway can be re-added by swapping the provider import. Both are one-PR changes.
Alternatives considered
- Keep Sonnet escalation, add spend cap — rejected; caps apply account-wide and would break voice agent concurrently. Root cause is the escalation trigger logic firing too frequently during QA, not the model selection itself.
- Keep separate PewSearch endpoint — rejected; creates ongoing drift risk. Tier gating inside the unified endpoint handles the access control cleanly.
Links
- DECISION_LOG entry:
## 2026-04-07 (Dashboard onboarding UX + Chatbot unification) - DECISION_LOG entry:
## 2026-04-06 (Cost audit + test lab expansion) - Related:
2026-04-09-chatbot-chat-legacy-deleted - Memory:
~/.claude/projects/C--dev/memory/project_chatbot_unification.md - Memory:
~/.claude/projects/C--dev/memory/feedback_chatbot_stream_is_production.md