Knowledge > Processes > Knowledge Derivation Pipeline
Knowledge Derivation Pipeline
pnpm derive propagates changes from canonical source files (YAML and Markdown in knowledge/) to all downstream targets (TypeScript files, marketing pages, database records, Google Drive). It ensures that when pricing changes in one place, it changes everywhere.
Overview
The system has three concepts:
- Sources -- canonical YAML or Markdown files in
knowledge/(e.g.,data/pricing.yaml) - Targets -- downstream files or systems that must reflect the source (e.g.,
src/lib/pricing.ts,product_knowledgetable, marketing pages) - Manifest --
knowledge/manifest.yamlmaps every source to its targets, including the script to run and the operation type
Manifest structure
The manifest (knowledge/manifest.yaml) defines which sources derive to which targets:
sources:
data/pricing.yaml:
derives:
- target: churchwiseai-web/src/lib/pricing.ts
script: derive-pricing
type: regenerate
- target: supabase:product_knowledge
script: derive-pricing
type: upsert
filter: "category IN ('billing', 'churchwiseai')"
- target: churchwiseai-web/src/app/pricing/page.tsx
script: derive-pricing
type: verify
- target: "gdrive:03-Strategy/Pricing/"
script: derive-gdrive
type: nightly
Target types
| Type | What it does | When changes are applied |
|---|---|---|
regenerate | Generates the entire target file from the source YAML | Immediately on commit |
verify | Checks that expected values exist in the target file (does not modify it) | Read-only -- flags drift |
upsert | Generates SQL INSERT/UPDATE statements for the product_knowledge table | Prints SQL for manual execution |
update-section | Replaces a specific marked section in a target file | Immediately on commit |
nightly | Queued for Google Drive sync (runs separately, not during derive) | Skipped during derive |
Current sources and their targets
| Source | Target count | Key targets |
|---|---|---|
data/pricing.yaml | 17 targets | pricing.ts (regenerate), product_knowledge (upsert), 10+ marketing pages (verify), PRICING.md (regenerate), voice agent prompts (verify), PewSearch/ITW pricing pages (verify) |
data/features.yaml | 3 targets | pricing.ts (regenerate), product_knowledge (upsert), Google Drive (nightly) |
data/products.yaml | 3 targets | brand.ts (verify), product_knowledge (upsert), Google Drive (nightly) |
data/policies.yaml | 4 targets | terms page (verify), privacy page (verify), product_knowledge (upsert), Google Drive (nightly) |
narrative/vision.md | 3 targets | CLAUDE.md brand-architecture section (update-section), product_knowledge (upsert), Google Drive (nightly) |
narrative/competitive.md | 2 targets | compare/[slug] pages (verify), Google Drive (nightly) |
narrative/sales-playbook.md | 2 targets | product_knowledge (upsert), Google Drive (nightly) |
narrative/brand.md | 1 target | Google Drive (nightly) |
narrative/strategy.md | 1 target | Google Drive (nightly) |
narrative/customer-journey.md | 1 target | Google Drive (nightly) |
narrative/operations.md | 1 target | Google Drive (nightly) |
data/tools.yaml | 5 targets | chatbot-tools.ts (verify), voice church/sales/core tools (verify), features.yaml tool counts (verify) |
Full pipeline: pnpm derive data/pricing.yaml
When run without flags, the pipeline runs all four phases automatically.
Phase 1: Dry-run (preview changes)
FOR each target defined for this source in the manifest:
Load the source YAML file and parse it
IF target type = "regenerate":
1. Call the generator function (e.g., generatePricingTs(pricingYaml, featuresYaml))
2. Read the current file on disk
3. Normalize line endings (CRLF -> LF for consistent comparison on Windows)
4. Compare generated output with current file
5. IF identical: report "UP TO DATE"
6. IF different: report "DRIFT DETECTED" and show a line-by-line diff
(Diff output is capped at 30 lines to keep it readable)
IF target type = "verify":
1. Read the target file
2. Extract expected values from the source YAML
Example: for pricing, extract all price strings ("$14.95", "$34.95", etc.)
3. Check that each expected value appears in the target file content
4. Report per-value: PASS (found), FAIL (missing), WARN (ambiguous)
IF target type = "upsert":
1. Call the SQL generator (e.g., generateProductKnowledgeSQL())
2. Count the number of INSERT/UPDATE statements
3. Report: "X SQL statement(s) would be executed"
IF target type = "update-section":
1. Read the target file
2. Read the narrative source content
3. Call updateSection(currentContent, sectionId, narrativeContent)
4. Compare the result with the current file
5. IF identical: "UP TO DATE"
6. IF different: "WOULD UPDATE"
IF target type = "nightly":
SKIP (handled by Drive sync cron, not derive)
After all targets are processed, write a lockfile (.derive-lock.json) with:
- Source file name
- Source file MD5 hash
- Timestamp
- All results (pass/fail/skip per target)
Phase 2: Commit (apply changes)
FOR each target:
IF target type = "regenerate":
Generate the content and WRITE it to disk
Report: "WRITTEN"
IF target type = "verify":
Re-run the verification checks (read-only, same as dry-run)
Report: pass/fail counts
IF target type = "upsert":
Generate the SQL statements
PRINT the SQL to console for manual execution
(v1 does not auto-execute SQL -- operator runs it via Supabase MCP or SQL editor)
Report: "X SQL statement(s) generated -- execute manually via Supabase"
IF target type = "update-section":
Generate the updated content and WRITE it to disk
Report: "WRITTEN"
IF target type = "nightly":
SKIP
Phase 3: Verify (confirm changes took effect)
Re-run dry-run checks on all regenerate and verify targets
to confirm the committed changes are correct.
FOR upsert targets: SKIP (manual SQL execution cannot be verified automatically)
FOR update-section targets: re-run the dry-run comparison
Phase 4: Append to changelog
Build a summary entry with timestamp and per-target status:
"## 2026-03-25 14:30 -- data/pricing.yaml"
"- churchwiseai-web/src/lib/pricing.ts: regenerate PASS"
"- supabase:product_knowledge: upsert PASS (3 SQL statements)"
"- churchwiseai-web/src/app/pricing/page.tsx: verify PASS (8/8 checks passed)"
Append to knowledge/changelog.md
Lockfile mechanics
The lockfile (.derive-lock.json) prevents stale commits:
WHEN --dry-run is run:
Write lockfile with: { source, hash, timestamp, results }
WHEN --commit is run:
1. Read lockfile
2. IF lockfile doesn't exist: ERROR "Run --dry-run first"
3. IF lockfile source doesn't match requested source: ERROR
4. IF lockfile is older than 10 minutes: ERROR "Re-run --dry-run"
5. IF source file MD5 has changed since dry-run: ERROR "Source changed -- re-run --dry-run"
6. PROCEED with commit
The 10-minute expiry prevents applying changes based on a stale preview.
CLI usage
pnpm derive data/pricing.yaml Full pipeline (dry-run + commit + verify + changelog)
pnpm derive data/pricing.yaml --dry-run Phase 1 only: preview what would change
pnpm derive data/pricing.yaml --commit Phase 2-4: apply from lockfile
pnpm derive --check Drift detection: scan ALL sources, exit 1 if drift
pnpm derive --all Full pipeline for every source in manifest
Drift detection mode (--check)
WHEN --check is run:
FOR each source in manifest:
1. Filter out nightly-only targets
2. Run dry-run on remaining targets
3. Collect failures
IF any source has failures:
Print drift details
EXIT 1 (non-zero exit code for CI integration)
ELSE:
Print "All sources in sync"
EXIT 0
This can be used in CI/CD to block deploys when knowledge has drifted.
Generator scripts
Each script handles a specific source type:
| Script | Source | What it generates |
|---|---|---|
derive-pricing | pricing.yaml + features.yaml | pricing.ts (full file), product_knowledge SQL, price verification |
derive-products | products.yaml | brand.ts verification, product_knowledge SQL |
derive-policies | policies.yaml | terms/privacy page verification, policy knowledge SQL |
derive-narrative | vision.md, competitive.md, etc. | CLAUDE.md section updates, narrative knowledge SQL |
derive-tools | tools.yaml | Tool schema verification across chatbot and voice codebases |
derive-gdrive | all sources | Google Drive sync (nightly cron, not run during derive) |
Cross-source dependencies
Some generators need multiple source files:
generatePricingTs()requires BOTHpricing.yamlANDfeatures.yaml- The script auto-loads the companion file regardless of which source triggered the derive
Safety properties
- Verify targets are read-only -- they never modify the target file, only report drift
- Upsert targets print SQL but don't execute it -- the operator reviews and runs manually
- Regenerate targets overwrite the entire file -- the generator is the source of truth
- Line endings are normalized (CRLF to LF) before comparison to avoid false drift on Windows
- Lockfile expiry (10 minutes) prevents applying stale changes
- Source hash check prevents committing when the source changed after dry-run
- No concurrent execution lock -- agents should use separate git branches to avoid conflicts