Skip to main content

Concurrent Claude Code Sessions — Safety Runbook

Multiple Claude Code windows routinely run against the same repos, the same Supabase instance, and — as of this incident — the same skill. This runbook documents the first known same-skill collision, explains what saved us, and lays down a small convention (an ACTIVE-SESSION.md sentinel) so the next collision is caught at entry instead of 45 minutes in.


The 2026-04-12 incident (summary)

Session A spent the afternoon in C:\dev on branch feat/no-website-outreach-spec:

  • 15:39 EDT — committed the no-website outreach spec (fdf348f)
  • 15:47 EDT — committed the 9-phase plan (9fa2ef4)
  • 17:52–17:57 EDT — wrote C:\Users\johnm\.claude\skills\church-outreach\SKILL.md and the seven command files (plan.md, research.md, draft.md, review.md, send.md, status.md, review-week.md)
  • 18:10 EDT — updated templates/email-rich-data.md

Around 18:12 EDT Session B — same user, a second Claude Code window — invoked the freshly-minted church-outreach skill and began running research + draft against the no-website-churches-2026-04 campaign. Neither session knew the other existed. They ran in parallel for roughly 45 minutes.

Evidence: outreach_contacts.updated_at shows two distinct burst signatures in the same window (22:20/22:21 UTC and 22:30/22:33 UTC, then a 14-row burst at 23:03 UTC) — inconsistent with a single serialized session. templates/email-rich-data.md also shows a second mtime from Session B's draft revisions on top of Session A's 18:10 edit.

No customer data was corrupted. Both sessions did per-contact status='researching'-then-status='researched' claims, so they naturally partitioned the target list without overlap. The damage was wasted parallel research tokens, a duplicated template edit (idempotent by luck), and the founder needing to reconcile two handoff narratives.


Why it happened (root causes)

  1. Skills have no entry-time awareness of siblings. A Claude Code session has no built-in way to ask "is another session already running this skill?" The shell, the repo, and Supabase are all shared, but the skill itself is stateless at the process boundary.
  2. A skill became hot the moment it was written. The founder opened Session B to "try out" the skill Session A had just finished authoring. This is a legitimate, expected workflow — but it collided with Session A still doing dogfood runs.
  3. The only cross-session signal was the DB. Contact-level status transitions partitioned the work, but only after both sessions had already spent research tokens on their respective slices. Neither was aware a peer existed.
  4. No convention, no norm. The founder can have 2–6 Claude Code windows open at any time (ChurchWiseAI, PewSearch, Steward, Remotion, knowledge, misc). Until now, no skill has declared "I am not safe to run in parallel with myself."

What worked (keep doing)

  • Per-row claims in the database. outreach_contacts.status acts as a lock. A row at researching cannot be picked up by another invocation. This is the safety net that prevented real damage. Every skill that touches shared state should follow the same pattern: claim a unit of work atomically before processing it.
  • Git branch isolation. Session A stayed on feat/no-website-outreach-spec. Session B did its reading on the same branch without committing. No git conflict occurred.
  • Disjoint file scopes. Most skill edits were file-level (one-writer-per-file), and template edits happened to be additive. Lucky, not principled — which is exactly what the sentinel pattern below fixes.

The prevention pattern — ACTIVE-SESSION.md sentinel

A tiny, opt-in convention for skills that do destructive work (DB writes, file writes, external API calls with side effects).

Location

{SKILL_ROOT}/ACTIVE-SESSION.md

E.g. C:\Users\johnm\.claude\skills\church-outreach\ACTIVE-SESSION.md.

Git-ignored at the skill level (echo ACTIVE-SESSION.md >> .gitignore inside the skill dir the first time this is adopted, if the skill is tracked).

Schema

---
session_id: 2026-04-12T22-11-43-a7f3
started_at: 2026-04-12T22:11:43-04:00
command: research
user_action: "batch of 20 contacts for no-website-churches-2026-04"
pid: optional — process id if easy to capture
---

# Active session

Session claimed this skill at 2026-04-12 22:11:43 -04:00.
If this file is older than 15 minutes and no activity is visible,
the prior session likely died without cleanup — safe to delete.

Lifecycle

  1. Entry check (destructive commands only):
    • Read ACTIVE-SESSION.md. Does it exist?
    • No → write one with the current session_id, started_at, command, and proceed.
    • Yes, started_at < 15 min old → refuse. Print: "Another session is active — invoked /<command> at <started_at>. Run /<skill> status to see where they are, or wait for them to finish. If you know that session is dead, delete {path}."
    • Yes, started_at ≥ 15 min old → warn, overwrite the sentinel (taking over), proceed. A >15-min sentinel means the prior session crashed or was force-closed. Print the warning so the user knows.
  2. Successful exit → delete ACTIVE-SESSION.md.
  3. Failed exit → leave it. A stale sentinel is a debugging clue, and the next invocation handles it via the age rule above.
  4. Read-only commands (e.g. status, review) → skip the check entirely. Both sessions can always inspect state.

Why 15 minutes

Long enough that a slow batch (e.g. research 20 dispatching 20 sub-agents with web fetches) doesn't trip its own expiration. Short enough that a genuinely-dead session isn't a 6-hour blocker. Tune per-skill if the commands take longer.

Implementation snippet (Bash — drop-in for a skill command)

#!/usr/bin/env bash
# Call this at the top of any destructive skill command.
# Args: $1=skill_root $2=command_name $3=user_action
set -euo pipefail
SENTINEL="$1/ACTIVE-SESSION.md"
NOW_ISO=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
if [[ -f "$SENTINEL" ]]; then
STARTED=$(grep -E '^started_at:' "$SENTINEL" | sed 's/started_at: //')
AGE_MIN=$(( ( $(date -u +%s) - $(date -u -d "$STARTED" +%s) ) / 60 ))
if (( AGE_MIN < 15 )); then
echo "REFUSE: another session active since $STARTED. Run /<skill> status or delete $SENTINEL if that session is dead." >&2
exit 2
fi
echo "WARN: stale sentinel (${AGE_MIN}m old). Taking over." >&2
fi
printf -- '---\nsession_id: %s\nstarted_at: %s\ncommand: %s\nuser_action: "%s"\n---\n' \
"$(uuidgen)" "$NOW_ISO" "$2" "$3" > "$SENTINEL"
trap 'rm -f "$SENTINEL"' EXIT # clean up on successful exit only — remove trap on error paths

For TypeScript skills, the equivalent is a fs.existsSync + fs.statSync().mtime check, fs.writeFileSync the frontmatter, and process.on('exit', () => fs.unlinkSync(...)) with the same >15-min-age bypass.


How to retrofit an existing skill

  1. Decide whether the skill is destructive (writes DB, files, sends email, calls paid APIs) or informational (reads only, summarizes, answers questions). Only destructive skills need the sentinel.
  2. Add ACTIVE-SESSION.md to the skill's .gitignore if the skill is tracked in a repo.
  3. Paste the entry-check snippet above into every destructive command's procedure (or factor it into a shared references/session-lock.sh).
  4. Mark read-only commands (status, review, list-*) as exempt — explicitly state in the command file that they skip the check.
  5. Update SKILL.md with a one-line note: Concurrent-safety: ACTIVE-SESSION.md sentinel, 15-min TTL. See commands/*.md.
  6. Test: open two terminals, run the destructive command in A, then in B — B should refuse with the message above.

Where this pattern does NOT belong

  • Read-only / informational skills. executive-assistant, chief-of-staff, seo-strategist (when used for analysis), last30days, conversational-ai (when used for prompt design, not deployment). Two agents reading is fine.
  • Skills whose operations are already atomic at a lower layer. If every write is a single DB UPDATE guarded by an optimistic-lock column, a sentinel is redundant. (The outreach skill's per-contact status field almost qualifies — but the wasted research tokens are the reason we still want it.)
  • Skills that dispatch to a single shared queue. If schedule or loop submit to a server-side cron, the server arbitrates. No local sentinel needed.
  • Per-invocation ephemeral skills that don't persist state (simplify, design critiques on pasted code, etc.).

How to detect concurrent-session activity (if you suspect it)

If a run looks weird — faster than expected, duplicate drafts, mystery status changes — check in this order:

  1. The sentinelcat {SKILL_ROOT}/ACTIVE-SESSION.md. If it exists with a recent started_at you don't recognize, another window is running.
  2. DB update pattern — for the outreach example:
    SELECT DATE_TRUNC('minute', updated_at) AS minute, COUNT(*), string_agg(DISTINCT status, ',')
    FROM outreach_contacts
    WHERE updated_at >= now() - interval '2 hours'
    GROUP BY 1 ORDER BY 1;
    Two non-overlapping bursts one minute apart strongly imply two writers. One serialized writer produces a single smooth ramp.
  3. File mtimesls -la {SKILL_ROOT}/templates/ {SKILL_ROOT}/commands/ and look for edits after your session's last write.
  4. Git loggit log --since="2 hours ago" --all --pretty=format:"%h %ai %s" on every repo the skill touches. Commits you didn't make = another session.
  5. Process listps -ef | grep -i claude (Git Bash) or Task Manager → look for multiple claude.exe processes.

Founder's role — when to close a sibling window

  • If Session B is about to duplicate Session A's work (same campaign, same batch) → close Session B, let Session A finish, then resume.
  • If Session B is on a disjoint slice (different skill, different repo) → leave both open.
  • If a sentinel is stale and you're certain the owning session is dead → delete the file. The next invocation will take over cleanly.
  • If you don't know which session is which → ask the active one "what branch are you on, what was your last commit sha, what command did you run last?" The one that answers coherently is alive.

The sentinel is not a lock against the founder. It's a lock against two Claudes not seeing each other. The founder is always allowed to tear down a sentinel manually.


Appendix — Skill Command Patches

Two proposed diffs for the founder to review and apply. These implement the sentinel check in the church-outreach skill's two destructive commands. status.md, review.md, review-week.md remain exempt (read-only). plan.md is a one-shot planning step — optional to retrofit. send.md already gates on founder approval per-contact, but adding the sentinel there too would prevent two sessions from double-sending; recommended as a follow-up.

Patch 1 — commands/research.md

Insert a new step 0. before the existing step 1.:

# Command: research <batch_size>

Dispatch parallel sub-agents to deep-research the next N unresearched contacts. Writes structured `research_json` to each contact row.

## Procedure

+0. **Claim the skill session.** Check `C:/Users/johnm/.claude/skills/church-outreach/ACTIVE-SESSION.md`:
+ - If it exists and `started_at` is less than 15 minutes ago → STOP. Print: `"Another church-outreach session is active (started {started_at}, command {command}). Run the 'status' command to see where they are, or wait for them to finish. If you know that session is dead, delete ACTIVE-SESSION.md."` Do not proceed.
+ - If it exists and `started_at` is 15+ minutes old → warn the user ("taking over stale session from {started_at}") and overwrite.
+ - If it doesn't exist → create it. Write frontmatter with `session_id` (random uuid or timestamp), `started_at` (ISO-8601 now), `command: research`, `user_action: "batch of <batch_size> contacts for no-website-churches-2026-04"`.
+ On successful completion of this command, delete `ACTIVE-SESSION.md`. On failure, leave it (next invocation will handle via the age rule).
+
1. Run `tsx C:/Users/johnm/.claude/skills/church-outreach/scripts/dispatch-research.ts no-website-churches-2026-04 <batch_size>` — this:

Patch 2 — commands/draft.md

Same insertion before step 1.:

# Command: draft <batch_size>

Generate personalized email drafts for the next N researched contacts. Every draft must pass the lint gate before being saved.

## Procedure

+0. **Claim the skill session.** Check `C:/Users/johnm/.claude/skills/church-outreach/ACTIVE-SESSION.md`:
+ - If it exists and `started_at` is less than 15 minutes ago → STOP with the refuse message (see runbook `concurrent-session-safety.md`).
+ - If it exists and `started_at` is 15+ minutes old → warn and overwrite.
+ - Otherwise → create it with `command: draft`, `user_action: "drafting <batch_size> emails"`.
+ Delete on successful completion. Leave on failure.
+ See `C:/dev/knowledge/runbooks/concurrent-session-safety.md` for the full pattern and rationale.
+
1. Run `tsx C:/Users/johnm/.claude/skills/church-outreach/scripts/draft-emails.ts no-website-churches-2026-04 <batch_size>` — returns N contact records with `research_json`.

Follow-ups (not patched here)

  • Add the same check to commands/send.md before the first Gmail API call.
  • Add ACTIVE-SESSION.md to C:/Users/johnm/.claude/skills/church-outreach/.gitignore (create the file if absent).
  • Factor the check into references/session-lock.md so future skills can link to one canonical implementation.