Evergreen Fact & Challenge Ingestion

How Eko generates timeless knowledge facts via AI and turns them into playable challenge cards — independent of the news pipeline.

What It Does

The evergreen pipeline produces "always-relevant" facts that never expire. Unlike news-derived facts (which have a 30-day expiry and are tied to real-world events), evergreen facts are AI-generated knowledge across every active topic category. They provide a stable content baseline so the feed always has something interesting, even when news is slow.

The pipeline has two stages:

  1. Evergreen generation — AI creates structured facts for each topic category, deduplicating against existing content
  2. Challenge generation — After a fact passes validation, AI generates pre-computed challenge material in multiple quiz styles

Both stages share the same workers, queues, and validation pipeline as news-extracted facts. The difference is how they enter the system and what triggers them.

Pipeline Overview

[Cron: 3 AM UTC daily]
       │
       ▼
  Distribute daily quota across active topics
       │
       ▼
  ┌─ GENERATE_EVERGREEN messages ────────────────────────┐
  │  One message per topic, count = quota share           │
  │  Queue: queue:generate_evergreen                      │
  └──────────────────────────────────────┬───────────────┘
       │
       ▼
  worker-facts picks up message
       │
       ▼
  ┌─ AI Generation (mid-tier model) ─────────────────────┐
  │  Dedup against existing titles for the topic          │
  │  Structured output: title, facts{}, context,          │
  │  challenge_title, notability_score                    │
  └──────────────────────────────────────┬───────────────┘
       │
       ▼
  Insert fact_record (status: pending_validation)
       │
       ▼
  ┌─ VALIDATE_FACT (multi_phase strategy) ───────────────┐
  │  4-phase pipeline: structural → consistency →         │
  │  cross-model → evidence                              │
  │  Stricter than news because no independent sources    │
  └──────────────────────┬──────────────┬────────────────┘
       │                 │
    INVALID           VALID
       │                 │
       ▼                 ▼
    rejected      ┌─ Post-validation fan-out ────────────┐
                  │  RESOLVE_IMAGE (parallel)             │
                  │  GENERATE_CHALLENGE_CONTENT (parallel) │
                  └──────────────────────────────────────┘
                         │
                         ▼
                  Challenge content upserted
                  (6 quiz styles per fact)
                         │
                         ▼
                  Fact appears in feed
                  (20% evergreen slice)

Evergreen Generation

Trigger

A Vercel cron fires daily at 3 AM UTC, hitting /api/cron/generate-evergreen. The route checks two gates before proceeding:

GateConfigDefaultEffect
Feature flagEVERGREEN_ENABLEDfalseBlocks all generation if false
Daily quotaEVERGREEN_DAILY_QUOTA20Max facts generated per day

Quota Distribution

The daily quota is distributed across active root-level topic categories:

  • If percentTarget is set on a category, it gets dailyQuota * percentTarget facts (e.g., 20% of 20 = 4 facts)
  • If not set, quota is split evenly across all categories
  • Each category gets at least 1 fact (floor clamped to 1)
  • Categories with no schemas are skipped

One GENERATE_EVERGREEN queue message is created per category, specifying the category ID, schema ID, and count.

AI Generation

The worker handler calls generateEvergreenFacts() from @eko/ai:

AspectDetail
Model tiermid (higher quality than default — evergreen is lower volume, higher stakes)
DeduplicationAI receives all existing fact titles for the topic to avoid repetition
Output per recordtitle, challenge_title, facts{}, context, notability_score, notability_reason
Schema enforcementFacts must conform to the topic's fact_record_schemas.fact_keys definition
Cost trackingTotal AI cost is split evenly across generated records and stored per-record

Fact Insertion

Each generated record is inserted into fact_records with:

  • status: 'pending_validation'
  • source_type: 'ai_generated'
  • No expires_at (evergreen facts never expire)
  • generation_cost_usd for cost auditing

A VALIDATE_FACT message is immediately enqueued with the multi_phase strategy.

Validation

Evergreen facts use the multi_phase validation strategy — the strictest available — because there are no independent news sources to corroborate them. The 4-phase pipeline:

  1. Structural — Schema conformance, type validation, injection detection ($0)
  2. Consistency — Internal contradictions, taxonomy rule violations ($0)
  3. Cross-Model — AI adversarial verification via Gemini 2.5 Flash (~$0.001)
  4. Evidence — External API corroboration (Wikipedia, Wikidata) + AI reasoner (~$0.002-0.005)

Phases 1-2 are free code-only checks that catch ~40% of defective facts before any AI call. A fact passes when confidence >= 0.7 and no flags contain "critical".

Challenge Content Generation

Challenge content is not scheduled — it's triggered automatically when a fact passes validation.

Trigger

When worker-validate marks a fact as validated, it enqueues two independent jobs:

  1. RESOLVE_IMAGE — find a suitable image via the priority cascade
  2. GENERATE_CHALLENGE_CONTENT — pre-generate quiz content

These are independent and run in parallel (different columns/tables, no shared state).

What Gets Generated

The generateChallengeContent() function in @eko/ai produces content for 6 pre-generated styles:

StyleUI PatternExample
multiple_choicePick from A/B/C/D"Which composer wrote 27 instruments?"
direct_questionAnswer a specific question"How many instruments did Prince play?"
fill_the_gapComplete a sentence"Prince played ___ instruments on his debut"
statement_blankFill in a statement"___ played every instrument on Purple Rain"
reverse_lookupIdentify from a description"Which musician mastered 27 instruments?"
free_textOpen-ended response"Why is Prince's multi-instrument mastery notable?"

Two styles are exempt from pre-generation:

  • conversational — generated in real-time during multi-turn dialogue
  • progressive_image_reveal — requires runtime image processing

The 5-Layer Structure

Every generated challenge includes:

LayerPurposeConstraint
setup_textBackstory that shares context freely2-4 sentences with specific details
challenge_textInvitation to answerMust contain "you" or "your"
reveal_correctCelebration when they know it1-3 sentences, teaches something extra
reveal_wrongKind teaching when they don't1-3 sentences, includes correct answer
correct_answerRich narrative for streaming display3-6 sentences, storytelling payoff

Prompt Assembly

The challenge generation prompt is a 10-layer composition built by buildSystemPrompt():

  1. Voice constitution (universal Eko tone)
  2. Taxonomy voice (domain-specific register and energy)
  3. Domain vocabulary (expert language patterns)
  4. Format voice (format-specific posture)
  5. Format rules (setup/challenge/reveal refinements)
  6. Style voice (per-style interaction mechanics)
  7. Style rules (what each field should contain)
  8. Taxonomy content rules (formatting conventions)
  9. Difficulty guidance (1-5 scale calibration)
  10. Generation instructions

All layers are driven by TypeScript data files in packages/ai/src/config/ (e.g., challenge-voice.ts, taxonomy-rules-data.ts). This means voice and style behavior can be changed by editing typed as const satisfies arrays without touching runtime logic.

Quality Enforcement

Content is validated at generation time via:

  • CQ rules (CQ-001 through CQ-013) — banned patterns, field length requirements, structural checks
  • Drift coordinators (5 pluggable validators) — voice, structure, schema, taxonomy, difficulty compliance
  • Patching — automated fixes for common violations (e.g., patchCq002() ensures "you"/"your" in challenge text)

Micro-Batching

Challenge content generation uses micro-batching to amortize the ~5,200-token system prompt across multiple facts. The worker (consumeChallengeBatch()) accumulates up to 5 queue messages over a 500ms window, then makes a single AI call per batch. Individual messages are ack'd or nack'd based on per-message success.

Post-Processing Pipeline

After AI generation, each challenge passes through automated patching:

  1. patchPassiveVoice() — rewrites passive constructions to active voice
  2. patchTextbookRegister() — eliminates academic/formal register
  3. patchCq002() — ensures "you"/"your" in challenge_text
  4. patchPunctuationSpacing() — fixes spacing issues
  5. patchGenericReveals() — deduplicates generic reveal text across a batch

Model & Cost

AspectDetail
Model tierdefault (cheaper than mid — challenge gen is higher volume)
Cost per fact~$0.006
StorageUpserted into fact_challenge_content with unique constraint on (fact_record_id, challenge_style, target_fact_key, difficulty)

Feed Selection

Once an evergreen fact has status: 'validated' and challenge content rows, it enters the feed.

Blended Algorithm

For authenticated users, the feed blends four content streams:

StreamWeightQuery
Recent validated40%getPublishedFacts()published_at DESC
Review-due30%getReviewDueFacts()nextReviewAt <= NOW(), streak < 5
Evergreen20%getEvergreenFacts()source_type='ai_generated', RANDOM()
Exploration10%getRandomFacts() — any validated fact, RANDOM()

Evergreen facts get their own dedicated 20% slice, queried randomly. They are also eligible to appear in the 10% exploration slice. Unauthenticated users see a chronological feed only.

Card Detail

When a user opens a card, all pre-generated challenges for that fact are loaded via getChallengeContentForFact(). The UI presents 4 tabs (Learn, Quiz, Recall, Challenge), each filtering challenges by challengeStyle. Within each tab, one challenge is randomly selected from the pool.

Cost Model

ComponentPer-UnitDaily (at 20 quota)Monthly
Evergreen generation~$0.01/fact~$0.20~$6
Validation (phases 3-4)~$0.003/fact~$0.06~$2
Challenge content~$0.006/fact~$0.12~$4
Total~$0.38~$12

Image resolution is free (Wikipedia, TheSportsDB, Unsplash, Pexels all have free tiers).

Key Differences from News Pipeline

AspectNews PipelineEvergreen Pipeline
TriggerExternal news APIs every 15 minCron daily at 3 AM UTC
Source typenews_extractionai_generated
Expiry30 days (auto-archive)None (permanent)
Validation strategymulti_source (confidence from source count)multi_phase (stricter, no sources to corroborate)
AI model tierDefault (high volume)Mid (lower volume, higher quality)
DeduplicationContent hash on news_sourcesTitle match on fact_records per topic
Feed weight40% (recent) + 10% (explore)20% (dedicated evergreen slice)

Key Files

FilePurpose
apps/web/app/api/cron/generate-evergreen/route.tsCron trigger: quota distribution and queue dispatch
apps/worker-facts/src/handlers/generate-evergreen.tsWorker handler: AI generation, insertion, validation enqueue
apps/worker-facts/src/handlers/generate-challenge-content.tsWorker handler: challenge generation and upsert
apps/worker-validate/src/handlers/validate-fact.tsPost-validation fan-out (RESOLVE_IMAGE + GENERATE_CHALLENGE_CONTENT)
packages/ai/src/fact-engine.tsgenerateEvergreenFacts() function
packages/ai/src/challenge-content.tsgenerateChallengeContent() and buildSystemPrompt()
packages/shared/src/schemas.tsZod schemas for GenerateEvergreenMessage and GenerateChallengeContentMessage
packages/queue/src/index.tsQueue client, message constructors
packages/db/src/drizzle/fact-engine-queries.tsFeed queries: getEvergreenFacts(), getPublishedFacts()
packages/config/src/index.tsEVERGREEN_ENABLED, EVERGREEN_DAILY_QUOTA config