Skip to main content

Key Rotation Procedure

Status: CANONICAL — P0 MANDATORY Owner: Founder (John Moelker) Last updated: 2026-05-14 Triggered by: 16-hour voice agent outage on 2026-05-13 caused by missed LiveKit Cloud secret rotation. Vercel projects received new SUPABASE_SERVICE_ROLE_KEY; LiveKit Cloud secrets were not updated. All paying-customer call logs lost for the window. Founder verbatim: "We can't have this amateurish key rotation that breaks our products."

BEFORE ANY ROTATION BEGINS: Read this document in full. Then open knowledge/runbooks/key-surface-manifest.yaml and find every surface for the key being rotated. A rotation touches multiple systems — missing one is how the 2026-05-13 outage happened.


§1 — Trigger Conditions

Rotate a key immediately when any of the following occur:

TriggerExampleUrgency
Confirmed leakKey appears in a git commit, log file, CI output, error message, screen shareRotate within 1 hour
Suspected compromiseUnusual API usage, unexplained charges, unauthorized DB queriesRotate within 4 hours
Vendor-forced rotationSupabase legacy key deprecation (2026-05-13 incident), provider migrationRotate by vendor deadline
Team-member departureAny person with access to any secret leavesRotate within 24 hours
Planned annual rotationCalendar reminder on each key's next_rotation_dueRotate in a scheduled maintenance window
Security audit findingPen-test or audit flags a key as over-privileged or staleRotate within 72 hours

§2 — Pre-Rotation Checklist

Do NOT generate a new key yet. Complete this checklist first.

2a. Identify the key

  • Name the exact env var (e.g., SUPABASE_SERVICE_ROLE_KEY)
  • Note its format prefix so you can verify the new key has the correct format (see key-surface-manifest.yamlformat_prefix)
  • Note when it was last rotated (see key-surface-manifest.yamllast_rotated)
  • Confirm which vendor dashboard issues the new key (Supabase → Project Settings → API; Anthropic → console.anthropic.com → API Keys; etc.)

2b. Identify all surfaces

  • Open knowledge/runbooks/key-surface-manifest.yaml
  • Find the key's entry and list every surface under its surfaces: block
  • For each surface, confirm you have access credentials:
    • Vercel: vercel env ls <project> (already authenticated)
    • LiveKit Cloud: C:\dev\lk.exe agent secrets --project cwa-voice --id CA_pX3Me4NK6qK8
    • GitHub Actions: gh secret list --repo <repo> (already authenticated)
    • Local file: direct filesystem access
  • Write the surface list down before proceeding — do not rely on memory

2c. Confirm rollback path

  • Identify where the OLD key is currently stored (vendor dashboard, knowledge/.env, Vercel)
  • Confirm whether the old key can be re-activated if the new key fails
  • For vendor keys (Anthropic, OpenAI, etc.): do NOT revoke the old key until all surfaces verify green with the new key. Keep the old key valid for at least 24 hours post-rotation. See §5.
  • For Supabase service role keys: the legacy key remains valid until explicitly revoked in Supabase dashboard → Project Settings → API → "Revoke legacy key"

2d. Schedule and announce

  • If the rotation affects live customer surfaces (voice agent, chatbot), schedule it during low-traffic hours (weekday 10 PM–6 AM ET)
  • For voice agent rotations: the agent restarts automatically on lk agent update-secrets. Budget 3–5 minutes of downtime. No advance customer notice needed for planned rotation under 5 minutes.
  • For Vercel deployments: a re-deploy is required to pick up new env vars. Budget 2–3 minutes per project.

§3 — Rotation Steps in Exact Order

Work through surfaces in the order listed below for every rotation. This order minimizes downtime: stateless Vercel functions first, then the voice agent (which requires a restart), then local files last.

Step 1 — Generate the new key (vendor dashboard)

Generate the replacement key at the vendor's dashboard. Write it to knowledge/.env immediately before updating any surface — this is your rollback copy if something goes wrong mid-rotation.

# Example: confirm knowledge/.env has the new value (names only — never echo values)
grep "SUPABASE_SERVICE_ROLE_KEY" "C:/dev/knowledge/.env"

Expected: the key name appears. Its value is the new key.

Step 2 — Vercel: churchwiseai-web

cd "C:/dev/churchwiseai-web"

# Update each environment that holds this key
printf "<NEW_VALUE>" | vercel env add <KEY_NAME> production
printf "<NEW_VALUE>" | vercel env add <KEY_NAME> preview
printf "<NEW_VALUE>" | vercel env add <KEY_NAME> development

# Trigger a re-deploy to pick up the new value
vercel --prod

Verification:

# Wait for deploy to complete (~2-3 min), then hit the health endpoint
curl -s https://churchwiseai.com/api/health | grep '"status":"ok"'

# For SUPABASE_SERVICE_ROLE_KEY specifically: hit an admin endpoint that reads the DB
curl -s "https://churchwiseai.com/api/admin/kb-proxy?token=DEMO_TOKEN" | grep -v "401\|503"

Expected output: HTTP 200, no 401 or 503 errors.

Step 3 — Vercel: pewsearch

cd "C:/dev/pewsearch/web"

printf "<NEW_VALUE>" | vercel env add <KEY_NAME> production
printf "<NEW_VALUE>" | vercel env add <KEY_NAME> preview
printf "<NEW_VALUE>" | vercel env add <KEY_NAME> development

vercel --prod

Verification:

curl -s https://pewsearch.com/api/health | grep '"status":"ok"'

Expected output: HTTP 200.

Step 4 — Vercel: sermon-illustrations (illustratetheword.com)

cd "C:/dev/sermon-illustrations"

printf "<NEW_VALUE>" | vercel env add <KEY_NAME> production
printf "<NEW_VALUE>" | vercel env add <KEY_NAME> preview
printf "<NEW_VALUE>" | vercel env add <KEY_NAME> development

vercel --prod

Verification:

curl -s https://illustratetheword.com/api/health | grep '"status":"ok"'

Expected output: HTTP 200.

Step 5 — LiveKit Cloud (voice agent secrets)

CRITICAL — THIS IS THE SURFACE MISSED IN THE 2026-05-13 OUTAGE. Normal lk agent deploy does NOT re-upload secrets. You MUST use update-secrets explicitly.

# Option A: Update individual secret (preferred for single-key rotation)
"C:/dev/lk.exe" agent update-secrets --project cwa-voice --id CA_pX3Me4NK6qK8 <KEY_NAME>=<NEW_VALUE>

# Option B: Re-upload the full .env secrets file (use after bulk rotation)
# Ensure voice-agent-livekit/.env has ALL secrets updated first, then:
"C:/dev/lk.exe" agent update-secrets --project cwa-voice --id CA_pX3Me4NK6qK8 --secrets-file "C:/dev/churchwiseai-web/voice-agent-livekit/.env"

The agent will restart automatically when secrets are updated. This takes 90–120 seconds.

Verification (MANDATORY — this is the test that would have caught the 2026-05-13 outage):

# Step 1: Wait for container restart
# (90 seconds minimum — do not skip this wait)

# Step 2: Check deploy logs for "registered worker"
"C:/dev/lk.exe" agent logs --project cwa-voice --id CA_pX3Me4NK6qK8 --log-type deploy

# Step 3: Verify agent is writing to the DB (proves Supabase key is valid)
# Check voice_call_logs for a row with updated_at in the last 5 minutes
# Use Supabase MCP: SELECT id, church_id, created_at FROM voice_call_logs ORDER BY created_at DESC LIMIT 3;

# Step 4: Dial the demo line and confirm the AI identifies correctly (not fallback "ChurchWiseAI")
# Demo line: check PHONE_REGISTRY in voice-agent-livekit/session.py for the Grace Community demo number
# Expected: "Thank you for calling Grace Community Church..."
# If you hear "ChurchWiseAI" fallback: the Supabase read is failing (401 — key still wrong)

IMPORTANT: lk agent list status showing "Available" is MISLEADING — it means the container exists, not that the agent is healthy. Only logs + an actual call tell the truth.

Step 6 — GitHub Actions secrets

Check knowledge/runbooks/key-surface-manifest.yaml for which repos have this key as a GitHub secret.

# churchwiseai-web
gh secret set <KEY_NAME> --repo churchwiseai-web --body "<NEW_VALUE>"
gh secret list --repo churchwiseai-web | grep <KEY_NAME>

# pewsearch (if applicable)
gh secret set <KEY_NAME> --repo pewsearch --body "<NEW_VALUE>"

# sermon-illustrations (if applicable)
gh secret set <KEY_NAME> --repo sermon-illustrations --body "<NEW_VALUE>"

Verification:

# Trigger a CI run that exercises the key
gh workflow run test.yml --repo churchwiseai-web
# Watch for green: gh run watch --repo churchwiseai-web

Expected output: workflow passes without 401/403 errors in job logs.

Step 7 — Local .env files

Update each local .env* file that holds this key. Check the manifest for which files are relevant. Common locations:

  • C:/dev/knowledge/.env — already updated in Step 1 (the rollback copy)
  • C:/dev/churchwiseai-web/voice-agent-livekit/.env — used by --secrets-file on agent create/update-secrets
  • C:/dev/churchwiseai-web/.env.local (if it exists — may not be present in dev)
  • C:/dev/pewsearch/web/.env.local (if it exists)
  • C:/dev/sermon-illustrations/.env.local (if it exists)
# Verify the file does NOT contain the old key value (check prefix to confirm update took)
grep "<KEY_NAME>" "C:/dev/churchwiseai-web/voice-agent-livekit/.env" | head -1
# Expected: line shows the key name = new value (never print the value in logs)

Verification:

# For voice agent .env: do a dry-run deploy check
cd "C:/dev/churchwiseai-web"
"C:/dev/lk.exe" agent deploy --project cwa-voice --silent --dry-run 2>&1 | tail -5

NEVER COMMIT .env FILES. Every .env* file is in .gitignore. Verify before any commit:

git -C "C:/dev/churchwiseai-web" status | grep ".env"
# Expected: no .env* files listed under "Changes to be committed"

§4 — Per-Surface Verification Gates

A rotation is NOT complete until each surface that holds the key passes its smoke test.

SurfaceSmoke TestPass Criteria
Vercel: churchwiseai-webcurl https://churchwiseai.com/api/healthHTTP 200, "status":"ok"
Vercel: pewsearchcurl https://pewsearch.com/api/healthHTTP 200
Vercel: sermon-illustrationscurl https://illustratetheword.com/api/healthHTTP 200
LiveKit Cloud voice agentDial Grace Community demo line; confirm church identity in greetingAI says church name (not "ChurchWiseAI" fallback); voice_call_logs row written within 60s
GitHub ActionsTrigger test.yml on churchwiseai-webCI green with no 401/403 in key-dependent steps
Local .env filesStart local dev server + smoke-call affected endpointNo 401 or missing-env errors in console

You can also run node scripts/verify-key-rotation.mjs (see C:/dev/churchwiseai-web/scripts/verify-key-rotation.mjs) for an automated pass/fail across all surfaces.


§5 — Old Key Revocation Policy

Do NOT revoke the old key until ALL surfaces have verified green with the new key.

Rationale: if any surface fails verification after you have already revoked the old key, you cannot roll back. You are now dependent on emergency vendor support to restore access. This is a multiple-hour to multiple-day recovery depending on vendor SLA.

Recommended revocation timeline:

  1. All surfaces verify green with new key → wait 24 hours
  2. No unexpected errors in monitoring (voice_call_logs, Vercel logs, Supabase logs) → revoke old key
  3. Document revocation date in key-surface-manifest.yaml → update last_rotated and revoked_at fields

Exception: If a key is confirmed leaked and actively being exploited, revoke immediately. Accept the rollback risk — the threat is worse than the recovery cost.


§6 — Failure Recovery

If a surface fails verification after updating to the new key:

  1. Stop. Do not proceed to the next surface.
  2. Revert that surface to the old key (still valid per §5 — you have not revoked it yet):
    # Example: revert Vercel production
    printf "<OLD_VALUE>" | vercel env add <KEY_NAME> production
    vercel --prod
  3. Document the failure state in a note to the founder:
    • Which surface failed
    • What error was observed
    • Which surfaces have the new key vs. old key (list both)
    • Whether any customer-facing impact occurred during the window
  4. Alert the founder before proceeding. Do not retry without understanding the failure.
  5. Investigate root cause before re-attempting:
    • Did the new key have the wrong format? (Check format_prefix in manifest)
    • Was the new key applied to the wrong variable name?
    • Did the vendor issue the key on the wrong project/account?
  6. Once root cause is understood and fixed, restart the rotation from Step 1.

§7 — Post-Rotation Audit

Within 24 hours of rotation completion:

  1. Run the full verification script one more time:

    cd "C:/dev/churchwiseai-web"
    node scripts/verify-key-rotation.mjs 2>&1 | tee "C:/dev/knowledge/audits/key-rotation-$(date +%Y-%m-%d).md"
  2. Update key-surface-manifest.yaml — set last_rotated and (if applicable) next_rotation_due for every surface that was updated.

  3. Check Supabase logs for any residual 401 errors that might indicate a surface was missed:

    -- Via Supabase MCP
    SELECT created_at, level, message FROM logs WHERE level = 'error' AND created_at > now() - interval '24 hours' ORDER BY created_at DESC LIMIT 20;
  4. Store the audit report at C:/dev/knowledge/audits/key-rotation-YYYY-MM-DD.md.


§8 — Key-Specific Notes

SUPABASE_SERVICE_ROLE_KEY

  • Format: Post-2026-05-13 Supabase new format begins with sb_secret_. Legacy format was a JWT beginning with eyJ.
  • Issue location: Supabase dashboard → Project Settings → API → "Service Role Key" (new key) or "Reveal legacy key" section.
  • Revoking legacy key: Supabase dashboard → Project Settings → API → "Revoke legacy key" button. Do NOT click until all surfaces verify green.
  • ALL three Vercel projects use this key. All three must be updated in one rotation window.
  • LiveKit voice agent uses this key for every DB read/write during calls. Missing this surface = every call fails silently (agent uses cached/fallback data, call log never written).

ANTHROPIC_API_KEY

  • Issue location: console.anthropic.com → API Keys → Create new key.
  • Affects: churchwiseai-web Vercel (chatbot stream, outreach provisioning), sermon-illustrations Vercel (SermonWise AI), LiveKit voice agent (Coordinator + Care Agent LLM calls).

OPENAI_API_KEY

  • Issue location: platform.openai.com → API Keys.
  • Affects: All three Vercel projects (unified chatbot, product support), LiveKit voice agent.

STRIPE_SECRET_KEY (live mode)

  • Issue location: dashboard.stripe.com → Developers → API Keys → "Reveal live mode secret key".
  • Caution: Live key is in knowledge/.env as STRIPE_LIVE_SECRET_KEY. This key charges real money. Confirm founder approval before rotating.
  • Also rotate: STRIPE_WEBHOOK_SECRET if compromised (separate from STRIPE_SECRET_KEY).

DEEPGRAM_API_KEY

  • Issue location: console.deepgram.com → Settings → API Keys.
  • Affects: LiveKit voice agent (STT). Vercel chatbot does not use Deepgram.

CARTESIA_API_KEY

  • Issue location: play.cartesia.ai → Settings → API Keys.
  • Affects: LiveKit voice agent (TTS) + churchwiseai-web Vercel (Cartesia MCP calls).

TELNYX_API_KEY

  • Issue location: portal.telnyx.com → Account → Keys & Credentials.
  • Affects: churchwiseai-web Vercel (provisioning API routes), voice agent (Telnyx SIP auth is separate — see Telnyx FQDN connection credentials in knowledge/runbooks/voice-provisioning.md).

RESEND_API_KEY

  • Issue location: resend.com → API Keys.
  • Two separate keys: RESEND_API_KEY (sending scope) and RESEND_MANAGEMENT_API_KEY (full access for domain management). Rotate each separately.
  • Affects: All three Vercel projects + LiveKit voice agent (notification emails).

Linked References

  • Surface manifest: knowledge/runbooks/key-surface-manifest.yaml
  • Verification script: churchwiseai-web/scripts/verify-key-rotation.mjs
  • Audit reports directory: knowledge/audits/
  • Voice deploy procedure: knowledge/runbooks/deployment/deploy-voice-agent.md
  • Voice provisioning: knowledge/runbooks/voice-provisioning.md
  • Memory rule: ~/.claude/projects/C--dev/memory/feedback_key_rotation_mandatory_procedure.md