Skip to main content

Unify chatbot traffic through /api/chatbot/stream

Status

DECIDED

Context

Three separate chatbot endpoints existed across codebases:

  • /api/chatbot/stream (churchwiseai-web) — Vercel AI SDK 6, SSE, being actively developed
  • /api/chatbot/chat (churchwiseai-web) — legacy route with Sonnet escalation paths
  • /api/chatbot/chat (pewsearch) — PewSearch-specific chatbot route

The Sonnet escalation path in the legacy /chat route was identified as the root cause of $10–30/day in surprise API bills during QA runs. Separately, a cost audit found that the Vercel AI Gateway was adding markup on every LLM call — the business was effectively paying twice for inference.

Additionally, the pro_website chatbot had separate limits (150 convos/mo for Premium Listing, 200 convos/mo for Pro Website) that were not enforced at a unified layer.

Decision

  1. Unify all chatbot traffic through /api/chatbot/stream with tier-gated toolsets. This is the single production chatbot endpoint.
  2. Remove the Vercel AI Gateway — switch to direct @ai-sdk/anthropic client. Gateway added per-call markup with no offsetting benefit.
  3. Remove the Sonnet escalation path entirely. Claude Haiku 4.5 handles all turns including pastoral care. Care library becomes unconditional (was gated on escalate=true).
  4. Tier-gated tool allocation: Premium Listing ($9.95) → 5 basic tools, 150 convos/mo, 1 agent. Pro Website ($19.95) → 12 starter tools, 200 convos/mo, 2 agents. CWA tiers → 12/35/39 tools per tier.
  5. PewSearch's own /api/chatbot/chat endpoint deprecated; Premium Listing chatbot embed switches to the CWA widget.

Rationale

  • Cost: Removing Sonnet escalation eliminates the $10–30/day API spike. Removing the gateway eliminates double-billing. Haiku 4.5 achieves 0.96–0.98 empathy scores — Sonnet provides no measurable quality lift for this use case.
  • Simplicity: One endpoint, one system prompt architecture, one place to debug. Prior dual-endpoint state meant a bug fix in /chat never reached /stream and vice versa.
  • Speed as empathy: Haiku's lower latency is itself a UX improvement — faster first token means the pastor's visitor feels heard sooner.

Consequences

  • Good: Single endpoint to maintain, test, and monitor. Cost normalized. Prompt caching on the ~9,700-token static system prompt achieves 90% discount on cached tokens ($0.008/M vs $0.80/M).
  • Bad: escalation.ts is now dead code (kept in place, imports removed). Any future need for tiered model routing must be re-architected from scratch.
  • Reversible? Escalation path can be reintroduced by re-importing escalation.ts and re-adding the gating condition. Gateway can be re-added by swapping the provider import. Both are one-PR changes.

Alternatives considered

  • Keep Sonnet escalation, add spend cap — rejected; caps apply account-wide and would break voice agent concurrently. Root cause is the escalation trigger logic firing too frequently during QA, not the model selection itself.
  • Keep separate PewSearch endpoint — rejected; creates ongoing drift risk. Tier gating inside the unified endpoint handles the access control cleanly.
  • DECISION_LOG entry: ## 2026-04-07 (Dashboard onboarding UX + Chatbot unification)
  • DECISION_LOG entry: ## 2026-04-06 (Cost audit + test lab expansion)
  • Related: 2026-04-09-chatbot-chat-legacy-deleted
  • Memory: ~/.claude/projects/C--dev/memory/project_chatbot_unification.md
  • Memory: ~/.claude/projects/C--dev/memory/feedback_chatbot_stream_is_production.md