Runbooks
Operational procedures for diagnosing and resolving production issues.
Triage First
Before diving into component runbooks, use the Incident Playbook to:
- Classify severity (SEV-1/2/3)
- Identify which system is failing
- Route to the correct runbook
V2 Runbook Index
| Question | Runbook |
|---|---|
| Is news not arriving? (API errors, dedup failures, images) | Ingestion |
| Are facts not being extracted? (AI errors, schema issues, costs) | Fact Extraction |
| Are facts stuck in pending? (validation failures, evidence API issues) | Validation |
| Are jobs not flowing? (backlogs, retries, stuck jobs) | Queue |
| Are checks not running on time? (cron, dispatch timing) | Scheduling |
Incident Playbook
Master triage and response guide with severity definitions, diagnostic routing, and post-incident process.
Queue
Job processing, backlogs, and retry storms.
Scheduling
Cron timing, dispatch, and check timing issues.
Ingestion
News API fetching, article deduplication, story clustering, and image resolution.
Fact Extraction
AI fact extraction, evergreen generation, challenge content, and seed pipeline.
Validation
Multi-phase fact verification: structural, consistency, cross-model, evidence.
V1 Runbooks (Archived)
V1 runbooks (tracker, render, summarization, MVP validation) have been archived to docs/docs_archive/runbooks-v1/.
Related
- Architecture Overview - System components
- Agent System - Agent ownership and routing