Unify chatbot traffic through /api/chatbot/stream

Status

DECIDED

Context

Three separate chatbot endpoints existed across codebases:

/api/chatbot/stream (churchwiseai-web) — Vercel AI SDK 6, SSE, being actively developed
/api/chatbot/chat (churchwiseai-web) — legacy route with Sonnet escalation paths
/api/chatbot/chat (pewsearch) — PewSearch-specific chatbot route

The Sonnet escalation path in the legacy /chat route was identified as the root cause of $10–30/day in surprise API bills during QA runs. Separately, a cost audit found that the Vercel AI Gateway was adding markup on every LLM call — the business was effectively paying twice for inference.

Additionally, the pro_website chatbot had separate limits (150 convos/mo for Premium Listing, 200 convos/mo for Pro Website) that were not enforced at a unified layer.

Decision

Unify all chatbot traffic through /api/chatbot/stream with tier-gated toolsets. This is the single production chatbot endpoint.
Remove the Vercel AI Gateway — switch to direct @ai-sdk/anthropic client. Gateway added per-call markup with no offsetting benefit.
Remove the Sonnet escalation path entirely. Claude Haiku 4.5 handles all turns including pastoral care. Care library becomes unconditional (was gated on escalate=true).
Tier-gated tool allocation: Premium Listing ($9.95) → 5 basic tools, 150 convos/mo, 1 agent. Pro Website ($19.95) → 12 starter tools, 200 convos/mo, 2 agents. CWA tiers → 12/35/39 tools per tier.
PewSearch's own /api/chatbot/chat endpoint deprecated; Premium Listing chatbot embed switches to the CWA widget.

Rationale

Cost: Removing Sonnet escalation eliminates the $10–30/day API spike. Removing the gateway eliminates double-billing. Haiku 4.5 achieves 0.96–0.98 empathy scores — Sonnet provides no measurable quality lift for this use case.
Simplicity: One endpoint, one system prompt architecture, one place to debug. Prior dual-endpoint state meant a bug fix in /chat never reached /stream and vice versa.
Speed as empathy: Haiku's lower latency is itself a UX improvement — faster first token means the pastor's visitor feels heard sooner.

Consequences

Good: Single endpoint to maintain, test, and monitor. Cost normalized. Prompt caching on the ~9,700-token static system prompt achieves 90% discount on cached tokens ($0.008/M vs $0.80/M).
Bad: escalation.ts is now dead code (kept in place, imports removed). Any future need for tiered model routing must be re-architected from scratch.
Reversible? Escalation path can be reintroduced by re-importing escalation.ts and re-adding the gating condition. Gateway can be re-added by swapping the provider import. Both are one-PR changes.

Alternatives considered

Keep Sonnet escalation, add spend cap — rejected; caps apply account-wide and would break voice agent concurrently. Root cause is the escalation trigger logic firing too frequently during QA, not the model selection itself.
Keep separate PewSearch endpoint — rejected; creates ongoing drift risk. Tier gating inside the unified endpoint handles the access control cleanly.

Status​

Context​

Decision​

Rationale​

Consequences​

Alternatives considered​

Links​