Content Prompts
Prompts for fact engine operations and validation: AI extraction, evergreen generation, schema keys, source types, and multi-tier fact validation.
Prompts
| # | Prompt | Purpose |
|---|---|---|
| 01 | Backfill Null Metadata | Patch facts missing notability or challenge content |
| 02 | Content Cleanup Pass | Clean up generated content |
| 03 | Backfill Summaries | Generate AI summaries for events missing them |
| 04 | Generate Evergreen Facts | Trigger evergreen fact generation pipeline |
| 05 | Run Enrichment | Enrichment orchestrator with multi-source entity context |
FAQ
Extraction
How does AI fact extraction work end-to-end?
- An
EXTRACT_FACTSqueue message arrives at worker-facts, handled byapps/worker-facts/src/handlers/extract-facts.ts. - The handler calls
extractFactsFromStory()inpackages/ai/src/fact-engine.ts, which usesgenerateObject()with a Zod schema to produce structured output. - Model selection is handled by
packages/ai/src/model-router.ts-- thedefaulttier routes to gemini-3-flash-preview (fallback default); runtime config is DB-driven. - Model routing is DB-driven via the
ai_model_tier_configtable with a 60-second cache -- no restart needed to switch models. - Enrichment context (KG, Wikidata, Wikipedia, GDELT, domain APIs) is injected into the extraction prompt via
packages/ai/src/enrichment.ts. - Subcategory classification is applied during extraction when the topic has active subcategories.
- Deterministic notability bypass: when both KG (score >= 100) and Wikidata (>= 20 sitelinks) confirm an entity, the AI notability score is overridden to 0.85.
- Each model has a dedicated ModelAdapter in
packages/ai/src/models/adapters/for per-model prompt optimizations. - Output fields:
title,facts{}(key-value JSONB),contextnarrative,challenge_title,notability_score,notability_reason.
What are schema keys and how does the system know which keys to use for a topic?
- The
fact_record_schemastable defines typed field definitions per topic schema (e.g.,player_name: text,career_points: number). - Topics link to schemas via
topic_categories.schema_id. - During extraction,
resolveTopicCategory()finds the category, retrieves schema keys, and passes them to the AI prompt. - Schema resolution utilities:
packages/ai/src/schema-utils.ts. - Schema storage:
fact_record_schemas.fact_keys(JSONB column).
What is a notability score and what threshold filters facts?
- A 0.0--1.0 AI-assessed score of how noteworthy a fact is.
- Default threshold:
NOTABILITY_THRESHOLD=0.6inpackages/config/src/index.ts. - Facts scoring below the threshold are discarded during extraction.
- Each fact also receives a
notability_reason-- one sentence explaining why it matters.
What is the difference between title and challenge_title?
title: Wikipedia-style, factual, searchable (e.g., "Prince's Multi-Instrument Mastery at Paisley Park").challenge_title: Movie-poster-style, theatrical, curiosity-provoking (e.g., "Twenty-Seven Instruments, One Take, Zero Help").- Both describe the same fact; they serve different audiences (internal reference vs user-facing hook).
Where is the extraction prompt defined and how do I modify it?
- Core extraction logic:
packages/ai/src/fact-engine.ts-- theextractFactsFromStory()andgenerateEvergreenFacts()functions. - The system prompt includes topic schema keys, taxonomy content rules, and voice guidelines.
- Taxonomy content rules:
packages/ai/src/taxonomy-content-rules.ts(loads frompackages/ai/src/config/taxonomy-rules-data.ts). - Per-model adapters in
packages/ai/src/models/adapters/inject model-specific prompt optimizations.
Evergreen
How does evergreen generation differ from news extraction?
- Trigger: daily cron at 3 AM UTC vs every-15-min news ingestion.
- Source type:
ai_generated(no source articles) vsnews_extraction(from clustered news). - Model tier:
mid(higher quality, lower volume) vsdefaultfor news. - Dedup: title match against existing facts for the topic vs
content_hashonnews_sources. - Expiry: none (permanent) vs 30 days for news facts.
- Validation:
multi_phasefor both, but evergreen relies more heavily on AI cross-check since there are no independent source articles.
How do I enable or disable evergreen generation?
- Master switch:
EVERGREEN_ENABLED=true|falsein.env.local(default:false). - Daily quota:
EVERGREEN_DAILY_QUOTA=20(max facts per day). - Both defined in
packages/config/src/index.ts. - Cron route:
apps/web/app/api/cron/generate-evergreen/route.ts-- checks theEVERGREEN_ENABLEDgate before dispatching.
How does evergreen deduplication work?
- Before generation, the handler fetches all existing fact titles for the target topic from
fact_records. - Those titles are injected into the AI prompt as "already exists -- do not regenerate."
- This is title-based dedup (not hash-based like news articles).
- Handler:
apps/worker-facts/src/handlers/generate-evergreen.ts.
What model tier is used for evergreen generation and why?
midtier -- higher quality thandefaultbecause evergreen content is lower volume and permanent (no expiry).- News extraction uses
defaulttier for cost efficiency (high volume, 30-day expiry). - Challenge content generation uses
defaulttier (high volume, regenerable). - Tier config:
packages/config/src/model-registry-data.ts.
Source Types
What are the six source types and when is each one used?
news_extraction-- Derived from clustered news articles (news pipeline).ai_generated-- Evergreen facts generated by AI (daily cron).file_seed-- Primary facts from curated seed entries (seed explosion).spinoff_discovery-- Tangential facts discovered during seed explosion.ai_super_fact-- Cross-entry correlations found by FIND_SUPER_FACTS.api_import-- Structured API imports (ESPN, WikiQuote, etc. -- stub, not yet active).- Stored in
fact_records.source_type.
How does the context field get written -- what structure does it follow?
- Three-part structure: Hook (1-2 sentences, surprising detail) -> Story (2-4 sentences, backstory with specifics) -> Connection (1-2 sentences, links to reader's existing knowledge).
- Reads like "a passionate friend sharing something at dinner" -- not a textbook.
- Voice rules defined in the voice constitution:
packages/ai/src/config/challenge-voice.ts. - See Fact-Challenge Anatomy for the full context field spec.
Validation Phases
What are the four validation phases and what does each one check?
- Phase 1 -- Structural: Schema conformance, type validation, injection detection ($0, code-only) --
packages/ai/src/validation/structural.ts. - Phase 2 -- Consistency: Internal contradictions, taxonomy rule violations ($0, code-only) --
packages/ai/src/validation/consistency.ts. - Phase 3 -- Cross-Model: AI adversarial verification via Gemini 2.5 Flash (~$0.001) --
packages/ai/src/validation/cross-model.ts. - Phase 4 -- Evidence: External API corroboration (Wikipedia, Wikidata) + AI reasoner via Gemini 2.5 Flash (~$0.002-0.005) --
packages/ai/src/validation/evidence.ts. - Phases 1-2 catch ~40% of defective facts before any AI call.
Which AI models are used for cross-model verification and evidence corroboration?
- Both phases 3 and 4 use Gemini 2.5 Flash (Google) for cross-provider verification (hardcoded for provider separation).
- This ensures the verifier is a different provider than the generator (which uses OpenAI/Anthropic).
- Requires
GOOGLE_API_KEYenv var. - Orchestration:
packages/ai/src/validation/index.ts.
What validation strategies exist and when is each one applied?
multi_phase-- Strictest; all 4 phases. Used for news extraction and AI-generated (evergreen) facts.authoritative_api-- Trusts the source; lighter checks. Used for API imports (ESPN, WikiQuote).curated_database-- Manual/pre-verified entries. Used for curated seed entries.- Strategy is passed in the
VALIDATE_FACTqueue message payload.
Validation Status Flow
What are the possible fact statuses and how do facts transition between them?
pending->pending_validation->validated->published(orrejected).pending: just inserted, awaiting validation enqueue.pending_validation: VALIDATE_FACT enqueued.validated: passed all phases -- eligible for feed.published: appeared in user feed (currently synonymous with validated in practice).rejected: failed validation -- never reaches feed.
What happens after a fact passes validation -- what gets enqueued next?
- Two independent fan-out jobs are enqueued in parallel:
- (1)
RESOLVE_IMAGE-> worker-ingest resolves an image via the priority cascade. - (2)
GENERATE_CHALLENGE_CONTENT-> worker-facts generates quiz content for 6 styles. - Handler:
apps/worker-validate/src/handlers/validate-fact.ts(post-validation logic).
How does the validation-retry cron work for stuck facts?
- Route:
apps/web/app/api/cron/validation-retry/route.ts. - Intended schedule: every 4 hours.
- Finds facts with
status = 'pending_validation'not updated in 4+ hours. - Re-enqueues them for
VALIDATE_FACT. - Safety net for queue failures or worker crashes.
Validation Confidence and Scoring
How are validation confidence scores calculated?
- Each phase produces its own confidence score (0.0-1.0) and flags.
- A fact passes when overall
confidence >= 0.7and no flags contain "critical". - Phase-by-phase audit trails are stored for transparency.
- Orchestrator:
packages/ai/src/validation/index.ts.
What causes a fact to fail validation?
- Structural: Schema keys don't match topic's
fact_record_schemas. - Consistency: Internal contradictions (e.g., date says 1990 but context says 2020).
- Cross-Model: The verifier AI disagrees with the generated fact.
- Evidence: No external corroboration found (Wikipedia/Wikidata return no match).
- Any "critical" flag in any phase -> automatic rejection.
Troubleshooting
Why are extracted facts missing structured data or have empty fact keys?
- Schema mismatch: the topic's
fact_record_schemas.fact_keysmay not match what the AI is generating. - Check the
fact_record_schemastable for the topic's schema definition. - The AI prompt includes schema keys -- if they are wrong or incomplete, the output will be malformed.
- Inspect schema resolution by tracing through
packages/ai/src/schema-utils.ts.
How do I check how many facts exist for a given topic category?
- Query:
SELECT count(*) FROM fact_records WHERE topic_category_id = '<id>' AND status = 'validated'. - Or use the admin dashboard content page:
admin.eko.day/content(filter by topic). - For a topic quotas audit:
curl -X POST localhost:3000/api/cron/topic-quotas -H "Authorization: Bearer $CRON_SECRET".
How do I investigate a fact that is stuck in pending_validation?
- Check if
VALIDATE_FACTmessage was enqueued: inspect queue depth forqueue:validate_fact. - Check worker-validate logs for errors processing the fact.
- Check DLQ:
queue:validate_fact:dlqfor messages that failed 3 times. - Manual re-trigger: the validation-retry cron will pick it up, or re-enqueue manually.
- Verify
GOOGLE_API_KEYis set (required for phases 3-4).
Where is validation logic implemented and how do I add a new phase?
- Orchestrator:
packages/ai/src/validation/index.ts-- runs phases in sequence. - Each phase is a separate file:
structural.ts,consistency.ts,cross-model.ts,evidence.ts. - To add a phase: create a new file in
packages/ai/src/validation/, implement theValidationPhaseinterface, and register it in the orchestrator. - Worker handler:
apps/worker-validate/src/handlers/validate-fact.ts.
Cost Tracking
How are reasoning tokens and thinking budgets tracked?
- Models that support extended thinking (e.g., Gemini 3 Flash) emit
reasoning_tokensalongside input/output tokens. - These are recorded in
ai_cost_log.reasoning_tokens_totalfor accurate cost attribution. - Thinking budgets are configured per-model in the model adapter; the cost tracker sums all token types for spend cap enforcement.
- Cost tracker:
packages/ai/src/cost-tracker.ts.
See Also
- Fact-Challenge Anatomy -- How facts become challenges
- Evergreen Ingestion -- Full evergreen pipeline reference
- News & Fact Engine -- System reference
- Key source:
packages/ai/src/fact-engine.ts,packages/ai/src/validation/