Runbooks

Operational procedures for diagnosing and resolving production issues.

Triage First

Before diving into component runbooks, use the Incident Playbook to:

  1. Classify severity (SEV-1/2/3)
  2. Identify which system is failing
  3. Route to the correct runbook

V2 Runbook Index

QuestionRunbook
Is news not arriving? (API errors, dedup failures, images)Ingestion
Are facts not being extracted? (AI errors, schema issues, costs)Fact Extraction
Are facts stuck in pending? (validation failures, evidence API issues)Validation
Are jobs not flowing? (backlogs, retries, stuck jobs)Queue
Are checks not running on time? (cron, dispatch timing)Scheduling

Incident Playbook

Master triage and response guide with severity definitions, diagnostic routing, and post-incident process.

Queue

Job processing, backlogs, and retry storms.

Scheduling

Cron timing, dispatch, and check timing issues.

Ingestion

News API fetching, article deduplication, story clustering, and image resolution.

Fact Extraction

AI fact extraction, evergreen generation, challenge content, and seed pipeline.

Validation

Multi-phase fact verification: structural, consistency, cross-model, evidence.

V1 Runbooks (Archived)

V1 runbooks (tracker, render, summarization, MVP validation) have been archived to docs/docs_archive/runbooks-v1/.