Evergreen Fact & Challenge Ingestion
How Eko generates timeless knowledge facts via AI and turns them into playable challenge cards — independent of the news pipeline.
What It Does
The evergreen pipeline produces "always-relevant" facts that never expire. Unlike news-derived facts (which have a 30-day expiry and are tied to real-world events), evergreen facts are AI-generated knowledge across every active topic category. They provide a stable content baseline so the feed always has something interesting, even when news is slow.
The pipeline has two stages:
- Evergreen generation — AI creates structured facts for each topic category, deduplicating against existing content
- Challenge generation — After a fact passes validation, AI generates pre-computed challenge material in multiple quiz styles
Both stages share the same workers, queues, and validation pipeline as news-extracted facts. The difference is how they enter the system and what triggers them.
Pipeline Overview
[Cron: 3 AM UTC daily]
│
▼
Distribute daily quota across active topics
│
▼
┌─ GENERATE_EVERGREEN messages ────────────────────────┐
│ One message per topic, count = quota share │
│ Queue: queue:generate_evergreen │
└──────────────────────────────────────┬───────────────┘
│
▼
worker-facts picks up message
│
▼
┌─ AI Generation (mid-tier model) ─────────────────────┐
│ Dedup against existing titles for the topic │
│ Structured output: title, facts{}, context, │
│ challenge_title, notability_score │
└──────────────────────────────────────┬───────────────┘
│
▼
Insert fact_record (status: pending_validation)
│
▼
┌─ VALIDATE_FACT (multi_phase strategy) ───────────────┐
│ 4-phase pipeline: structural → consistency → │
│ cross-model → evidence │
│ Stricter than news because no independent sources │
└──────────────────────┬──────────────┬────────────────┘
│ │
INVALID VALID
│ │
▼ ▼
rejected ┌─ Post-validation fan-out ────────────┐
│ RESOLVE_IMAGE (parallel) │
│ GENERATE_CHALLENGE_CONTENT (parallel) │
└──────────────────────────────────────┘
│
▼
Challenge content upserted
(6 quiz styles per fact)
│
▼
Fact appears in feed
(20% evergreen slice)
Evergreen Generation
Trigger
A Vercel cron fires daily at 3 AM UTC, hitting /api/cron/generate-evergreen. The route checks two gates before proceeding:
| Gate | Config | Default | Effect |
|---|---|---|---|
| Feature flag | EVERGREEN_ENABLED | false | Blocks all generation if false |
| Daily quota | EVERGREEN_DAILY_QUOTA | 20 | Max facts generated per day |
Quota Distribution
The daily quota is distributed across active root-level topic categories:
- If
percentTargetis set on a category, it getsdailyQuota * percentTargetfacts (e.g., 20% of 20 = 4 facts) - If not set, quota is split evenly across all categories
- Each category gets at least 1 fact (floor clamped to 1)
- Categories with no schemas are skipped
One GENERATE_EVERGREEN queue message is created per category, specifying the category ID, schema ID, and count.
AI Generation
The worker handler calls generateEvergreenFacts() from @eko/ai:
| Aspect | Detail |
|---|---|
| Model tier | mid (higher quality than default — evergreen is lower volume, higher stakes) |
| Deduplication | AI receives all existing fact titles for the topic to avoid repetition |
| Output per record | title, challenge_title, facts{}, context, notability_score, notability_reason |
| Schema enforcement | Facts must conform to the topic's fact_record_schemas.fact_keys definition |
| Cost tracking | Total AI cost is split evenly across generated records and stored per-record |
Fact Insertion
Each generated record is inserted into fact_records with:
status: 'pending_validation'source_type: 'ai_generated'- No
expires_at(evergreen facts never expire) generation_cost_usdfor cost auditing
A VALIDATE_FACT message is immediately enqueued with the multi_phase strategy.
Validation
Evergreen facts use the multi_phase validation strategy — the strictest available — because there are no independent news sources to corroborate them. The 4-phase pipeline:
- Structural — Schema conformance, type validation, injection detection ($0)
- Consistency — Internal contradictions, taxonomy rule violations ($0)
- Cross-Model — AI adversarial verification via Gemini 2.5 Flash (~$0.001)
- Evidence — External API corroboration (Wikipedia, Wikidata) + AI reasoner (~$0.002-0.005)
Phases 1-2 are free code-only checks that catch ~40% of defective facts before any AI call. A fact passes when confidence >= 0.7 and no flags contain "critical".
Challenge Content Generation
Challenge content is not scheduled — it's triggered automatically when a fact passes validation.
Trigger
When worker-validate marks a fact as validated, it enqueues two independent jobs:
RESOLVE_IMAGE— find a suitable image via the priority cascadeGENERATE_CHALLENGE_CONTENT— pre-generate quiz content
These are independent and run in parallel (different columns/tables, no shared state).
What Gets Generated
The generateChallengeContent() function in @eko/ai produces content for 6 pre-generated styles:
| Style | UI Pattern | Example |
|---|---|---|
multiple_choice | Pick from A/B/C/D | "Which composer wrote 27 instruments?" |
direct_question | Answer a specific question | "How many instruments did Prince play?" |
fill_the_gap | Complete a sentence | "Prince played ___ instruments on his debut" |
statement_blank | Fill in a statement | "___ played every instrument on Purple Rain" |
reverse_lookup | Identify from a description | "Which musician mastered 27 instruments?" |
free_text | Open-ended response | "Why is Prince's multi-instrument mastery notable?" |
Two styles are exempt from pre-generation:
conversational— generated in real-time during multi-turn dialogueprogressive_image_reveal— requires runtime image processing
The 5-Layer Structure
Every generated challenge includes:
| Layer | Purpose | Constraint |
|---|---|---|
setup_text | Backstory that shares context freely | 2-4 sentences with specific details |
challenge_text | Invitation to answer | Must contain "you" or "your" |
reveal_correct | Celebration when they know it | 1-3 sentences, teaches something extra |
reveal_wrong | Kind teaching when they don't | 1-3 sentences, includes correct answer |
correct_answer | Rich narrative for streaming display | 3-6 sentences, storytelling payoff |
Prompt Assembly
The challenge generation prompt is a 10-layer composition built by buildSystemPrompt():
- Voice constitution (universal Eko tone)
- Taxonomy voice (domain-specific register and energy)
- Domain vocabulary (expert language patterns)
- Format voice (format-specific posture)
- Format rules (setup/challenge/reveal refinements)
- Style voice (per-style interaction mechanics)
- Style rules (what each field should contain)
- Taxonomy content rules (formatting conventions)
- Difficulty guidance (1-5 scale calibration)
- Generation instructions
All layers are driven by TypeScript data files in packages/ai/src/config/ (e.g., challenge-voice.ts, taxonomy-rules-data.ts). This means voice and style behavior can be changed by editing typed as const satisfies arrays without touching runtime logic.
Quality Enforcement
Content is validated at generation time via:
- CQ rules (CQ-001 through CQ-013) — banned patterns, field length requirements, structural checks
- Drift coordinators (5 pluggable validators) — voice, structure, schema, taxonomy, difficulty compliance
- Patching — automated fixes for common violations (e.g.,
patchCq002()ensures "you"/"your" in challenge text)
Micro-Batching
Challenge content generation uses micro-batching to amortize the ~5,200-token system prompt across multiple facts. The worker (consumeChallengeBatch()) accumulates up to 5 queue messages over a 500ms window, then makes a single AI call per batch. Individual messages are ack'd or nack'd based on per-message success.
Post-Processing Pipeline
After AI generation, each challenge passes through automated patching:
patchPassiveVoice()— rewrites passive constructions to active voicepatchTextbookRegister()— eliminates academic/formal registerpatchCq002()— ensures "you"/"your" inchallenge_textpatchPunctuationSpacing()— fixes spacing issuespatchGenericReveals()— deduplicates generic reveal text across a batch
Model & Cost
| Aspect | Detail |
|---|---|
| Model tier | default (cheaper than mid — challenge gen is higher volume) |
| Cost per fact | ~$0.006 |
| Storage | Upserted into fact_challenge_content with unique constraint on (fact_record_id, challenge_style, target_fact_key, difficulty) |
Feed Selection
Once an evergreen fact has status: 'validated' and challenge content rows, it enters the feed.
Blended Algorithm
For authenticated users, the feed blends four content streams:
| Stream | Weight | Query |
|---|---|---|
| Recent validated | 40% | getPublishedFacts() — published_at DESC |
| Review-due | 30% | getReviewDueFacts() — nextReviewAt <= NOW(), streak < 5 |
| Evergreen | 20% | getEvergreenFacts() — source_type='ai_generated', RANDOM() |
| Exploration | 10% | getRandomFacts() — any validated fact, RANDOM() |
Evergreen facts get their own dedicated 20% slice, queried randomly. They are also eligible to appear in the 10% exploration slice. Unauthenticated users see a chronological feed only.
Card Detail
When a user opens a card, all pre-generated challenges for that fact are loaded via getChallengeContentForFact(). The UI presents 4 tabs (Learn, Quiz, Recall, Challenge), each filtering challenges by challengeStyle. Within each tab, one challenge is randomly selected from the pool.
Cost Model
| Component | Per-Unit | Daily (at 20 quota) | Monthly |
|---|---|---|---|
| Evergreen generation | ~$0.01/fact | ~$0.20 | ~$6 |
| Validation (phases 3-4) | ~$0.003/fact | ~$0.06 | ~$2 |
| Challenge content | ~$0.006/fact | ~$0.12 | ~$4 |
| Total | ~$0.38 | ~$12 |
Image resolution is free (Wikipedia, TheSportsDB, Unsplash, Pexels all have free tiers).
Key Differences from News Pipeline
| Aspect | News Pipeline | Evergreen Pipeline |
|---|---|---|
| Trigger | External news APIs every 15 min | Cron daily at 3 AM UTC |
| Source type | news_extraction | ai_generated |
| Expiry | 30 days (auto-archive) | None (permanent) |
| Validation strategy | multi_source (confidence from source count) | multi_phase (stricter, no sources to corroborate) |
| AI model tier | Default (high volume) | Mid (lower volume, higher quality) |
| Deduplication | Content hash on news_sources | Title match on fact_records per topic |
| Feed weight | 40% (recent) + 10% (explore) | 20% (dedicated evergreen slice) |
Key Files
| File | Purpose |
|---|---|
apps/web/app/api/cron/generate-evergreen/route.ts | Cron trigger: quota distribution and queue dispatch |
apps/worker-facts/src/handlers/generate-evergreen.ts | Worker handler: AI generation, insertion, validation enqueue |
apps/worker-facts/src/handlers/generate-challenge-content.ts | Worker handler: challenge generation and upsert |
apps/worker-validate/src/handlers/validate-fact.ts | Post-validation fan-out (RESOLVE_IMAGE + GENERATE_CHALLENGE_CONTENT) |
packages/ai/src/fact-engine.ts | generateEvergreenFacts() function |
packages/ai/src/challenge-content.ts | generateChallengeContent() and buildSystemPrompt() |
packages/shared/src/schemas.ts | Zod schemas for GenerateEvergreenMessage and GenerateChallengeContentMessage |
packages/queue/src/index.ts | Queue client, message constructors |
packages/db/src/drizzle/fact-engine-queries.ts | Feed queries: getEvergreenFacts(), getPublishedFacts() |
packages/config/src/index.ts | EVERGREEN_ENABLED, EVERGREEN_DAILY_QUOTA config |
Related
- News-to-Challenge Ingestion Guide — Step-by-step walkthrough of the news pipeline
- Fact-Challenge Anatomy — How facts become challenges (6 concepts, 5 layers)
- News & Fact Engine — System reference (providers, costs, config)
- Challenge Content Rules — Quality rules CQ-001 through CQ-013
- APP-CONTROL.md — Operational manifest (crons, workers, queues)