Content Prompts

Prompts for fact engine operations and validation: AI extraction, evergreen generation, schema keys, source types, and multi-tier fact validation.

Prompts

#PromptPurpose
01Backfill Null MetadataPatch facts missing notability or challenge content
02Content Cleanup PassClean up generated content
03Backfill SummariesGenerate AI summaries for events missing them
04Generate Evergreen FactsTrigger evergreen fact generation pipeline
05Run EnrichmentEnrichment orchestrator with multi-source entity context

FAQ

Extraction

How does AI fact extraction work end-to-end?

  • An EXTRACT_FACTS queue message arrives at worker-facts, handled by apps/worker-facts/src/handlers/extract-facts.ts.
  • The handler calls extractFactsFromStory() in packages/ai/src/fact-engine.ts, which uses generateObject() with a Zod schema to produce structured output.
  • Model selection is handled by packages/ai/src/model-router.ts -- the default tier routes to gemini-3-flash-preview (fallback default); runtime config is DB-driven.
  • Model routing is DB-driven via the ai_model_tier_config table with a 60-second cache -- no restart needed to switch models.
  • Enrichment context (KG, Wikidata, Wikipedia, GDELT, domain APIs) is injected into the extraction prompt via packages/ai/src/enrichment.ts.
  • Subcategory classification is applied during extraction when the topic has active subcategories.
  • Deterministic notability bypass: when both KG (score >= 100) and Wikidata (>= 20 sitelinks) confirm an entity, the AI notability score is overridden to 0.85.
  • Each model has a dedicated ModelAdapter in packages/ai/src/models/adapters/ for per-model prompt optimizations.
  • Output fields: title, facts{} (key-value JSONB), context narrative, challenge_title, notability_score, notability_reason.

What are schema keys and how does the system know which keys to use for a topic?

  • The fact_record_schemas table defines typed field definitions per topic schema (e.g., player_name: text, career_points: number).
  • Topics link to schemas via topic_categories.schema_id.
  • During extraction, resolveTopicCategory() finds the category, retrieves schema keys, and passes them to the AI prompt.
  • Schema resolution utilities: packages/ai/src/schema-utils.ts.
  • Schema storage: fact_record_schemas.fact_keys (JSONB column).

What is a notability score and what threshold filters facts?

  • A 0.0--1.0 AI-assessed score of how noteworthy a fact is.
  • Default threshold: NOTABILITY_THRESHOLD=0.6 in packages/config/src/index.ts.
  • Facts scoring below the threshold are discarded during extraction.
  • Each fact also receives a notability_reason -- one sentence explaining why it matters.

What is the difference between title and challenge_title?

  • title: Wikipedia-style, factual, searchable (e.g., "Prince's Multi-Instrument Mastery at Paisley Park").
  • challenge_title: Movie-poster-style, theatrical, curiosity-provoking (e.g., "Twenty-Seven Instruments, One Take, Zero Help").
  • Both describe the same fact; they serve different audiences (internal reference vs user-facing hook).

Where is the extraction prompt defined and how do I modify it?

  • Core extraction logic: packages/ai/src/fact-engine.ts -- the extractFactsFromStory() and generateEvergreenFacts() functions.
  • The system prompt includes topic schema keys, taxonomy content rules, and voice guidelines.
  • Taxonomy content rules: packages/ai/src/taxonomy-content-rules.ts (loads from packages/ai/src/config/taxonomy-rules-data.ts).
  • Per-model adapters in packages/ai/src/models/adapters/ inject model-specific prompt optimizations.

Evergreen

How does evergreen generation differ from news extraction?

  • Trigger: daily cron at 3 AM UTC vs every-15-min news ingestion.
  • Source type: ai_generated (no source articles) vs news_extraction (from clustered news).
  • Model tier: mid (higher quality, lower volume) vs default for news.
  • Dedup: title match against existing facts for the topic vs content_hash on news_sources.
  • Expiry: none (permanent) vs 30 days for news facts.
  • Validation: multi_phase for both, but evergreen relies more heavily on AI cross-check since there are no independent source articles.

How do I enable or disable evergreen generation?

  • Master switch: EVERGREEN_ENABLED=true|false in .env.local (default: false).
  • Daily quota: EVERGREEN_DAILY_QUOTA=20 (max facts per day).
  • Both defined in packages/config/src/index.ts.
  • Cron route: apps/web/app/api/cron/generate-evergreen/route.ts -- checks the EVERGREEN_ENABLED gate before dispatching.

How does evergreen deduplication work?

  • Before generation, the handler fetches all existing fact titles for the target topic from fact_records.
  • Those titles are injected into the AI prompt as "already exists -- do not regenerate."
  • This is title-based dedup (not hash-based like news articles).
  • Handler: apps/worker-facts/src/handlers/generate-evergreen.ts.

What model tier is used for evergreen generation and why?

  • mid tier -- higher quality than default because evergreen content is lower volume and permanent (no expiry).
  • News extraction uses default tier for cost efficiency (high volume, 30-day expiry).
  • Challenge content generation uses default tier (high volume, regenerable).
  • Tier config: packages/config/src/model-registry-data.ts.

Source Types

What are the six source types and when is each one used?

  • news_extraction -- Derived from clustered news articles (news pipeline).
  • ai_generated -- Evergreen facts generated by AI (daily cron).
  • file_seed -- Primary facts from curated seed entries (seed explosion).
  • spinoff_discovery -- Tangential facts discovered during seed explosion.
  • ai_super_fact -- Cross-entry correlations found by FIND_SUPER_FACTS.
  • api_import -- Structured API imports (ESPN, WikiQuote, etc. -- stub, not yet active).
  • Stored in fact_records.source_type.

How does the context field get written -- what structure does it follow?

  • Three-part structure: Hook (1-2 sentences, surprising detail) -> Story (2-4 sentences, backstory with specifics) -> Connection (1-2 sentences, links to reader's existing knowledge).
  • Reads like "a passionate friend sharing something at dinner" -- not a textbook.
  • Voice rules defined in the voice constitution: packages/ai/src/config/challenge-voice.ts.
  • See Fact-Challenge Anatomy for the full context field spec.

Validation Phases

What are the four validation phases and what does each one check?

  • Phase 1 -- Structural: Schema conformance, type validation, injection detection ($0, code-only) -- packages/ai/src/validation/structural.ts.
  • Phase 2 -- Consistency: Internal contradictions, taxonomy rule violations ($0, code-only) -- packages/ai/src/validation/consistency.ts.
  • Phase 3 -- Cross-Model: AI adversarial verification via Gemini 2.5 Flash (~$0.001) -- packages/ai/src/validation/cross-model.ts.
  • Phase 4 -- Evidence: External API corroboration (Wikipedia, Wikidata) + AI reasoner via Gemini 2.5 Flash (~$0.002-0.005) -- packages/ai/src/validation/evidence.ts.
  • Phases 1-2 catch ~40% of defective facts before any AI call.

Which AI models are used for cross-model verification and evidence corroboration?

  • Both phases 3 and 4 use Gemini 2.5 Flash (Google) for cross-provider verification (hardcoded for provider separation).
  • This ensures the verifier is a different provider than the generator (which uses OpenAI/Anthropic).
  • Requires GOOGLE_API_KEY env var.
  • Orchestration: packages/ai/src/validation/index.ts.

What validation strategies exist and when is each one applied?

  • multi_phase -- Strictest; all 4 phases. Used for news extraction and AI-generated (evergreen) facts.
  • authoritative_api -- Trusts the source; lighter checks. Used for API imports (ESPN, WikiQuote).
  • curated_database -- Manual/pre-verified entries. Used for curated seed entries.
  • Strategy is passed in the VALIDATE_FACT queue message payload.

Validation Status Flow

What are the possible fact statuses and how do facts transition between them?

  • pending -> pending_validation -> validated -> published (or rejected).
  • pending: just inserted, awaiting validation enqueue.
  • pending_validation: VALIDATE_FACT enqueued.
  • validated: passed all phases -- eligible for feed.
  • published: appeared in user feed (currently synonymous with validated in practice).
  • rejected: failed validation -- never reaches feed.

What happens after a fact passes validation -- what gets enqueued next?

  • Two independent fan-out jobs are enqueued in parallel:
  • (1) RESOLVE_IMAGE -> worker-ingest resolves an image via the priority cascade.
  • (2) GENERATE_CHALLENGE_CONTENT -> worker-facts generates quiz content for 6 styles.
  • Handler: apps/worker-validate/src/handlers/validate-fact.ts (post-validation logic).

How does the validation-retry cron work for stuck facts?

  • Route: apps/web/app/api/cron/validation-retry/route.ts.
  • Intended schedule: every 4 hours.
  • Finds facts with status = 'pending_validation' not updated in 4+ hours.
  • Re-enqueues them for VALIDATE_FACT.
  • Safety net for queue failures or worker crashes.

Validation Confidence and Scoring

How are validation confidence scores calculated?

  • Each phase produces its own confidence score (0.0-1.0) and flags.
  • A fact passes when overall confidence >= 0.7 and no flags contain "critical".
  • Phase-by-phase audit trails are stored for transparency.
  • Orchestrator: packages/ai/src/validation/index.ts.

What causes a fact to fail validation?

  • Structural: Schema keys don't match topic's fact_record_schemas.
  • Consistency: Internal contradictions (e.g., date says 1990 but context says 2020).
  • Cross-Model: The verifier AI disagrees with the generated fact.
  • Evidence: No external corroboration found (Wikipedia/Wikidata return no match).
  • Any "critical" flag in any phase -> automatic rejection.

Troubleshooting

Why are extracted facts missing structured data or have empty fact keys?

  • Schema mismatch: the topic's fact_record_schemas.fact_keys may not match what the AI is generating.
  • Check the fact_record_schemas table for the topic's schema definition.
  • The AI prompt includes schema keys -- if they are wrong or incomplete, the output will be malformed.
  • Inspect schema resolution by tracing through packages/ai/src/schema-utils.ts.

How do I check how many facts exist for a given topic category?

  • Query: SELECT count(*) FROM fact_records WHERE topic_category_id = '<id>' AND status = 'validated'.
  • Or use the admin dashboard content page: admin.eko.day/content (filter by topic).
  • For a topic quotas audit: curl -X POST localhost:3000/api/cron/topic-quotas -H "Authorization: Bearer $CRON_SECRET".

How do I investigate a fact that is stuck in pending_validation?

  • Check if VALIDATE_FACT message was enqueued: inspect queue depth for queue:validate_fact.
  • Check worker-validate logs for errors processing the fact.
  • Check DLQ: queue:validate_fact:dlq for messages that failed 3 times.
  • Manual re-trigger: the validation-retry cron will pick it up, or re-enqueue manually.
  • Verify GOOGLE_API_KEY is set (required for phases 3-4).

Where is validation logic implemented and how do I add a new phase?

  • Orchestrator: packages/ai/src/validation/index.ts -- runs phases in sequence.
  • Each phase is a separate file: structural.ts, consistency.ts, cross-model.ts, evidence.ts.
  • To add a phase: create a new file in packages/ai/src/validation/, implement the ValidationPhase interface, and register it in the orchestrator.
  • Worker handler: apps/worker-validate/src/handlers/validate-fact.ts.

Cost Tracking

How are reasoning tokens and thinking budgets tracked?

  • Models that support extended thinking (e.g., Gemini 3 Flash) emit reasoning_tokens alongside input/output tokens.
  • These are recorded in ai_cost_log.reasoning_tokens_total for accurate cost attribution.
  • Thinking budgets are configured per-model in the model adapter; the cost tracker sums all token types for spend cap enforcement.
  • Cost tracker: packages/ai/src/cost-tracker.ts.

See Also