Key Rotation Procedure
Status: CANONICAL — P0 MANDATORY
Owner: Founder (John Moelker)
Last updated: 2026-05-14
Triggered by: 16-hour voice agent outage on 2026-05-13 caused by missed LiveKit Cloud secret rotation. Vercel projects received new SUPABASE_SERVICE_ROLE_KEY; LiveKit Cloud secrets were not updated. All paying-customer call logs lost for the window. Founder verbatim: "We can't have this amateurish key rotation that breaks our products."
BEFORE ANY ROTATION BEGINS: Read this document in full. Then open
knowledge/runbooks/key-surface-manifest.yamland find every surface for the key being rotated. A rotation touches multiple systems — missing one is how the 2026-05-13 outage happened.
§1 — Trigger Conditions
Rotate a key immediately when any of the following occur:
| Trigger | Example | Urgency |
|---|---|---|
| Confirmed leak | Key appears in a git commit, log file, CI output, error message, screen share | Rotate within 1 hour |
| Suspected compromise | Unusual API usage, unexplained charges, unauthorized DB queries | Rotate within 4 hours |
| Vendor-forced rotation | Supabase legacy key deprecation (2026-05-13 incident), provider migration | Rotate by vendor deadline |
| Team-member departure | Any person with access to any secret leaves | Rotate within 24 hours |
| Planned annual rotation | Calendar reminder on each key's next_rotation_due | Rotate in a scheduled maintenance window |
| Security audit finding | Pen-test or audit flags a key as over-privileged or stale | Rotate within 72 hours |
§2 — Pre-Rotation Checklist
Do NOT generate a new key yet. Complete this checklist first.
2a. Identify the key
- Name the exact env var (e.g.,
SUPABASE_SERVICE_ROLE_KEY) - Note its format prefix so you can verify the new key has the correct format (see
key-surface-manifest.yaml→format_prefix) - Note when it was last rotated (see
key-surface-manifest.yaml→last_rotated) - Confirm which vendor dashboard issues the new key (Supabase → Project Settings → API; Anthropic → console.anthropic.com → API Keys; etc.)
2b. Identify all surfaces
- Open
knowledge/runbooks/key-surface-manifest.yaml - Find the key's entry and list every surface under its
surfaces:block - For each surface, confirm you have access credentials:
- Vercel:
vercel env ls <project>(already authenticated) - LiveKit Cloud:
C:\dev\lk.exe agent secrets --project cwa-voice --id CA_pX3Me4NK6qK8 - GitHub Actions:
gh secret list --repo <repo>(already authenticated) - Local file: direct filesystem access
- Vercel:
- Write the surface list down before proceeding — do not rely on memory
2c. Confirm rollback path
- Identify where the OLD key is currently stored (vendor dashboard,
knowledge/.env, Vercel) - Confirm whether the old key can be re-activated if the new key fails
- For vendor keys (Anthropic, OpenAI, etc.): do NOT revoke the old key until all surfaces verify green with the new key. Keep the old key valid for at least 24 hours post-rotation. See §5.
- For Supabase service role keys: the legacy key remains valid until explicitly revoked in Supabase dashboard → Project Settings → API → "Revoke legacy key"
2d. Schedule and announce
- If the rotation affects live customer surfaces (voice agent, chatbot), schedule it during low-traffic hours (weekday 10 PM–6 AM ET)
- For voice agent rotations: the agent restarts automatically on
lk agent update-secrets. Budget 3–5 minutes of downtime. No advance customer notice needed for planned rotation under 5 minutes. - For Vercel deployments: a re-deploy is required to pick up new env vars. Budget 2–3 minutes per project.
§3 — Rotation Steps in Exact Order
Work through surfaces in the order listed below for every rotation. This order minimizes downtime: stateless Vercel functions first, then the voice agent (which requires a restart), then local files last.
Step 1 — Generate the new key (vendor dashboard)
Generate the replacement key at the vendor's dashboard. Write it to knowledge/.env immediately before updating any surface — this is your rollback copy if something goes wrong mid-rotation.
# Example: confirm knowledge/.env has the new value (names only — never echo values)
grep "SUPABASE_SERVICE_ROLE_KEY" "C:/dev/knowledge/.env"
Expected: the key name appears. Its value is the new key.
Step 2 — Vercel: churchwiseai-web
cd "C:/dev/churchwiseai-web"
# Update each environment that holds this key
printf "<NEW_VALUE>" | vercel env add <KEY_NAME> production
printf "<NEW_VALUE>" | vercel env add <KEY_NAME> preview
printf "<NEW_VALUE>" | vercel env add <KEY_NAME> development
# Trigger a re-deploy to pick up the new value
vercel --prod
Verification:
# Wait for deploy to complete (~2-3 min), then hit the health endpoint
curl -s https://churchwiseai.com/api/health | grep '"status":"ok"'
# For SUPABASE_SERVICE_ROLE_KEY specifically: hit an admin endpoint that reads the DB
curl -s "https://churchwiseai.com/api/admin/kb-proxy?token=DEMO_TOKEN" | grep -v "401\|503"
Expected output: HTTP 200, no 401 or 503 errors.
Step 3 — Vercel: pewsearch
cd "C:/dev/pewsearch/web"
printf "<NEW_VALUE>" | vercel env add <KEY_NAME> production
printf "<NEW_VALUE>" | vercel env add <KEY_NAME> preview
printf "<NEW_VALUE>" | vercel env add <KEY_NAME> development
vercel --prod
Verification:
curl -s https://pewsearch.com/api/health | grep '"status":"ok"'
Expected output: HTTP 200.
Step 4 — Vercel: sermon-illustrations (illustratetheword.com)
cd "C:/dev/sermon-illustrations"
printf "<NEW_VALUE>" | vercel env add <KEY_NAME> production
printf "<NEW_VALUE>" | vercel env add <KEY_NAME> preview
printf "<NEW_VALUE>" | vercel env add <KEY_NAME> development
vercel --prod
Verification:
curl -s https://illustratetheword.com/api/health | grep '"status":"ok"'
Expected output: HTTP 200.
Step 5 — LiveKit Cloud (voice agent secrets)
CRITICAL — THIS IS THE SURFACE MISSED IN THE 2026-05-13 OUTAGE. Normal
lk agent deploydoes NOT re-upload secrets. You MUST useupdate-secretsexplicitly.
# Option A: Update individual secret (preferred for single-key rotation)
"C:/dev/lk.exe" agent update-secrets --project cwa-voice --id CA_pX3Me4NK6qK8 <KEY_NAME>=<NEW_VALUE>
# Option B: Re-upload the full .env secrets file (use after bulk rotation)
# Ensure voice-agent-livekit/.env has ALL secrets updated first, then:
"C:/dev/lk.exe" agent update-secrets --project cwa-voice --id CA_pX3Me4NK6qK8 --secrets-file "C:/dev/churchwiseai-web/voice-agent-livekit/.env"
The agent will restart automatically when secrets are updated. This takes 90–120 seconds.
Verification (MANDATORY — this is the test that would have caught the 2026-05-13 outage):
# Step 1: Wait for container restart
# (90 seconds minimum — do not skip this wait)
# Step 2: Check deploy logs for "registered worker"
"C:/dev/lk.exe" agent logs --project cwa-voice --id CA_pX3Me4NK6qK8 --log-type deploy
# Step 3: Verify agent is writing to the DB (proves Supabase key is valid)
# Check voice_call_logs for a row with updated_at in the last 5 minutes
# Use Supabase MCP: SELECT id, church_id, created_at FROM voice_call_logs ORDER BY created_at DESC LIMIT 3;
# Step 4: Dial the demo line and confirm the AI identifies correctly (not fallback "ChurchWiseAI")
# Demo line: check PHONE_REGISTRY in voice-agent-livekit/session.py for the Grace Community demo number
# Expected: "Thank you for calling Grace Community Church..."
# If you hear "ChurchWiseAI" fallback: the Supabase read is failing (401 — key still wrong)
IMPORTANT: lk agent list status showing "Available" is MISLEADING — it means the container exists, not that the agent is healthy. Only logs + an actual call tell the truth.
Step 6 — GitHub Actions secrets
Check knowledge/runbooks/key-surface-manifest.yaml for which repos have this key as a GitHub secret.
# churchwiseai-web
gh secret set <KEY_NAME> --repo churchwiseai-web --body "<NEW_VALUE>"
gh secret list --repo churchwiseai-web | grep <KEY_NAME>
# pewsearch (if applicable)
gh secret set <KEY_NAME> --repo pewsearch --body "<NEW_VALUE>"
# sermon-illustrations (if applicable)
gh secret set <KEY_NAME> --repo sermon-illustrations --body "<NEW_VALUE>"
Verification:
# Trigger a CI run that exercises the key
gh workflow run test.yml --repo churchwiseai-web
# Watch for green: gh run watch --repo churchwiseai-web
Expected output: workflow passes without 401/403 errors in job logs.
Step 7 — Local .env files
Update each local .env* file that holds this key. Check the manifest for which files are relevant. Common locations:
C:/dev/knowledge/.env— already updated in Step 1 (the rollback copy)C:/dev/churchwiseai-web/voice-agent-livekit/.env— used by--secrets-fileon agent create/update-secretsC:/dev/churchwiseai-web/.env.local(if it exists — may not be present in dev)C:/dev/pewsearch/web/.env.local(if it exists)C:/dev/sermon-illustrations/.env.local(if it exists)
# Verify the file does NOT contain the old key value (check prefix to confirm update took)
grep "<KEY_NAME>" "C:/dev/churchwiseai-web/voice-agent-livekit/.env" | head -1
# Expected: line shows the key name = new value (never print the value in logs)
Verification:
# For voice agent .env: do a dry-run deploy check
cd "C:/dev/churchwiseai-web"
"C:/dev/lk.exe" agent deploy --project cwa-voice --silent --dry-run 2>&1 | tail -5
NEVER COMMIT .env FILES. Every .env* file is in .gitignore. Verify before any commit:
git -C "C:/dev/churchwiseai-web" status | grep ".env"
# Expected: no .env* files listed under "Changes to be committed"
§4 — Per-Surface Verification Gates
A rotation is NOT complete until each surface that holds the key passes its smoke test.
| Surface | Smoke Test | Pass Criteria |
|---|---|---|
| Vercel: churchwiseai-web | curl https://churchwiseai.com/api/health | HTTP 200, "status":"ok" |
| Vercel: pewsearch | curl https://pewsearch.com/api/health | HTTP 200 |
| Vercel: sermon-illustrations | curl https://illustratetheword.com/api/health | HTTP 200 |
| LiveKit Cloud voice agent | Dial Grace Community demo line; confirm church identity in greeting | AI says church name (not "ChurchWiseAI" fallback); voice_call_logs row written within 60s |
| GitHub Actions | Trigger test.yml on churchwiseai-web | CI green with no 401/403 in key-dependent steps |
| Local .env files | Start local dev server + smoke-call affected endpoint | No 401 or missing-env errors in console |
You can also run node scripts/verify-key-rotation.mjs (see C:/dev/churchwiseai-web/scripts/verify-key-rotation.mjs) for an automated pass/fail across all surfaces.
§5 — Old Key Revocation Policy
Do NOT revoke the old key until ALL surfaces have verified green with the new key.
Rationale: if any surface fails verification after you have already revoked the old key, you cannot roll back. You are now dependent on emergency vendor support to restore access. This is a multiple-hour to multiple-day recovery depending on vendor SLA.
Recommended revocation timeline:
- All surfaces verify green with new key → wait 24 hours
- No unexpected errors in monitoring (
voice_call_logs, Vercel logs, Supabase logs) → revoke old key - Document revocation date in
key-surface-manifest.yaml→ updatelast_rotatedandrevoked_atfields
Exception: If a key is confirmed leaked and actively being exploited, revoke immediately. Accept the rollback risk — the threat is worse than the recovery cost.
§6 — Failure Recovery
If a surface fails verification after updating to the new key:
- Stop. Do not proceed to the next surface.
- Revert that surface to the old key (still valid per §5 — you have not revoked it yet):
# Example: revert Vercel productionprintf "<OLD_VALUE>" | vercel env add <KEY_NAME> productionvercel --prod
- Document the failure state in a note to the founder:
- Which surface failed
- What error was observed
- Which surfaces have the new key vs. old key (list both)
- Whether any customer-facing impact occurred during the window
- Alert the founder before proceeding. Do not retry without understanding the failure.
- Investigate root cause before re-attempting:
- Did the new key have the wrong format? (Check
format_prefixin manifest) - Was the new key applied to the wrong variable name?
- Did the vendor issue the key on the wrong project/account?
- Did the new key have the wrong format? (Check
- Once root cause is understood and fixed, restart the rotation from Step 1.
§7 — Post-Rotation Audit
Within 24 hours of rotation completion:
-
Run the full verification script one more time:
cd "C:/dev/churchwiseai-web"node scripts/verify-key-rotation.mjs 2>&1 | tee "C:/dev/knowledge/audits/key-rotation-$(date +%Y-%m-%d).md" -
Update
key-surface-manifest.yaml— setlast_rotatedand (if applicable)next_rotation_duefor every surface that was updated. -
Check Supabase logs for any residual 401 errors that might indicate a surface was missed:
-- Via Supabase MCPSELECT created_at, level, message FROM logs WHERE level = 'error' AND created_at > now() - interval '24 hours' ORDER BY created_at DESC LIMIT 20; -
Store the audit report at
C:/dev/knowledge/audits/key-rotation-YYYY-MM-DD.md.
§8 — Key-Specific Notes
SUPABASE_SERVICE_ROLE_KEY
- Format: Post-2026-05-13 Supabase new format begins with
sb_secret_. Legacy format was a JWT beginning witheyJ. - Issue location: Supabase dashboard → Project Settings → API → "Service Role Key" (new key) or "Reveal legacy key" section.
- Revoking legacy key: Supabase dashboard → Project Settings → API → "Revoke legacy key" button. Do NOT click until all surfaces verify green.
- ALL three Vercel projects use this key. All three must be updated in one rotation window.
- LiveKit voice agent uses this key for every DB read/write during calls. Missing this surface = every call fails silently (agent uses cached/fallback data, call log never written).
ANTHROPIC_API_KEY
- Issue location: console.anthropic.com → API Keys → Create new key.
- Affects: churchwiseai-web Vercel (chatbot stream, outreach provisioning), sermon-illustrations Vercel (SermonWise AI), LiveKit voice agent (Coordinator + Care Agent LLM calls).
OPENAI_API_KEY
- Issue location: platform.openai.com → API Keys.
- Affects: All three Vercel projects (unified chatbot, product support), LiveKit voice agent.
STRIPE_SECRET_KEY (live mode)
- Issue location: dashboard.stripe.com → Developers → API Keys → "Reveal live mode secret key".
- Caution: Live key is in
knowledge/.envasSTRIPE_LIVE_SECRET_KEY. This key charges real money. Confirm founder approval before rotating. - Also rotate:
STRIPE_WEBHOOK_SECRETif compromised (separate fromSTRIPE_SECRET_KEY).
DEEPGRAM_API_KEY
- Issue location: console.deepgram.com → Settings → API Keys.
- Affects: LiveKit voice agent (STT). Vercel chatbot does not use Deepgram.
CARTESIA_API_KEY
- Issue location: play.cartesia.ai → Settings → API Keys.
- Affects: LiveKit voice agent (TTS) + churchwiseai-web Vercel (Cartesia MCP calls).
TELNYX_API_KEY
- Issue location: portal.telnyx.com → Account → Keys & Credentials.
- Affects: churchwiseai-web Vercel (provisioning API routes), voice agent (Telnyx SIP auth is separate — see Telnyx FQDN connection credentials in
knowledge/runbooks/voice-provisioning.md).
RESEND_API_KEY
- Issue location: resend.com → API Keys.
- Two separate keys:
RESEND_API_KEY(sending scope) andRESEND_MANAGEMENT_API_KEY(full access for domain management). Rotate each separately. - Affects: All three Vercel projects + LiveKit voice agent (notification emails).
Linked References
- Surface manifest:
knowledge/runbooks/key-surface-manifest.yaml - Verification script:
churchwiseai-web/scripts/verify-key-rotation.mjs - Audit reports directory:
knowledge/audits/ - Voice deploy procedure:
knowledge/runbooks/deployment/deploy-voice-agent.md - Voice provisioning:
knowledge/runbooks/voice-provisioning.md - Memory rule:
~/.claude/projects/C--dev/memory/feedback_key_rotation_mandatory_procedure.md