Evergreen Fact & Challenge Ingestion

How Eko generates timeless knowledge facts via AI and turns them into playable challenge cards — independent of the news pipeline.

What It Does

The evergreen pipeline produces "always-relevant" facts that never expire. Unlike news-derived facts (which have a 30-day expiry and are tied to real-world events), evergreen facts are AI-generated knowledge across every active topic category. They provide a stable content baseline so the feed always has something interesting, even when news is slow.

The pipeline has two stages:

Evergreen generation — AI creates structured facts for each topic category, deduplicating against existing content
Challenge generation — After a fact passes validation, AI generates pre-computed challenge material in multiple quiz styles

Both stages share the same workers, queues, and validation pipeline as news-extracted facts. The difference is how they enter the system and what triggers them.

Pipeline Overview

[Cron: 3 AM UTC daily]
       │
       ▼
  Distribute daily quota across active topics
       │
       ▼
  ┌─ GENERATE_EVERGREEN messages ────────────────────────┐
  │  One message per topic, count = quota share           │
  │  Queue: queue:generate_evergreen                      │
  └──────────────────────────────────────┬───────────────┘
       │
       ▼
  worker-facts picks up message
       │
       ▼
  ┌─ AI Generation (mid-tier model) ─────────────────────┐
  │  Dedup against existing titles for the topic          │
  │  Structured output: title, facts{}, context,          │
  │  challenge_title, notability_score                    │
  └──────────────────────────────────────┬───────────────┘
       │
       ▼
  Insert fact_record (status: pending_validation)
       │
       ▼
  ┌─ VALIDATE_FACT (multi_phase strategy) ───────────────┐
  │  4-phase pipeline: structural → consistency →         │
  │  cross-model → evidence                              │
  │  Stricter than news because no independent sources    │
  └──────────────────────┬──────────────┬────────────────┘
       │                 │
    INVALID           VALID
       │                 │
       ▼                 ▼
    rejected      ┌─ Post-validation fan-out ────────────┐
                  │  RESOLVE_IMAGE (parallel)             │
                  │  GENERATE_CHALLENGE_CONTENT (parallel) │
                  └──────────────────────────────────────┘
                         │
                         ▼
                  Challenge content upserted
                  (6 quiz styles per fact)
                         │
                         ▼
                  Fact appears in feed
                  (20% evergreen slice)

Evergreen Generation

Trigger

A Vercel cron fires daily at 3 AM UTC, hitting /api/cron/generate-evergreen. The route checks two gates before proceeding:

Gate	Config	Default	Effect
Feature flag	`EVERGREEN_ENABLED`	`false`	Blocks all generation if `false`
Daily quota	`EVERGREEN_DAILY_QUOTA`	`20`	Max facts generated per day

Quota Distribution

The daily quota is distributed across active root-level topic categories:

If percentTarget is set on a category, it gets dailyQuota * percentTarget facts (e.g., 20% of 20 = 4 facts)
If not set, quota is split evenly across all categories
Each category gets at least 1 fact (floor clamped to 1)
Categories with no schemas are skipped

One GENERATE_EVERGREEN queue message is created per category, specifying the category ID, schema ID, and count.

AI Generation

The worker handler calls generateEvergreenFacts() from @eko/ai:

Aspect	Detail
Model tier	`mid` (higher quality than default — evergreen is lower volume, higher stakes)
Deduplication	AI receives all existing fact titles for the topic to avoid repetition
Output per record	`title`, `challenge_title`, `facts{}`, `context`, `notability_score`, `notability_reason`
Schema enforcement	Facts must conform to the topic's `fact_record_schemas.fact_keys` definition
Cost tracking	Total AI cost is split evenly across generated records and stored per-record

Fact Insertion

Each generated record is inserted into fact_records with:

status: 'pending_validation'
source_type: 'ai_generated'
No expires_at (evergreen facts never expire)
generation_cost_usd for cost auditing

A VALIDATE_FACT message is immediately enqueued with the multi_phase strategy.

Validation

Evergreen facts use the multi_phase validation strategy — the strictest available — because there are no independent news sources to corroborate them. The 4-phase pipeline:

Structural — Schema conformance, type validation, injection detection ($0)
Consistency — Internal contradictions, taxonomy rule violations ($0)
Cross-Model — AI adversarial verification via Gemini 2.5 Flash (~$0.001)
Evidence — External API corroboration (Wikipedia, Wikidata) + AI reasoner (~$0.002-0.005)

Phases 1-2 are free code-only checks that catch ~40% of defective facts before any AI call. A fact passes when confidence >= 0.7 and no flags contain "critical".

Challenge Content Generation

Challenge content is not scheduled — it's triggered automatically when a fact passes validation.

Trigger

When worker-validate marks a fact as validated, it enqueues two independent jobs:

RESOLVE_IMAGE — find a suitable image via the priority cascade
GENERATE_CHALLENGE_CONTENT — pre-generate quiz content

These are independent and run in parallel (different columns/tables, no shared state).

What Gets Generated

The generateChallengeContent() function in @eko/ai produces content for 6 pre-generated styles:

Style	UI Pattern	Example
`multiple_choice`	Pick from A/B/C/D	"Which composer wrote 27 instruments?"
`direct_question`	Answer a specific question	"How many instruments did Prince play?"
`fill_the_gap`	Complete a sentence	"Prince played ___ instruments on his debut"
`statement_blank`	Fill in a statement	"___ played every instrument on Purple Rain"
`reverse_lookup`	Identify from a description	"Which musician mastered 27 instruments?"
`free_text`	Open-ended response	"Why is Prince's multi-instrument mastery notable?"

Two styles are exempt from pre-generation:

conversational — generated in real-time during multi-turn dialogue
progressive_image_reveal — requires runtime image processing

The 5-Layer Structure

Every generated challenge includes:

Layer	Purpose	Constraint
`setup_text`	Backstory that shares context freely	2-4 sentences with specific details
`challenge_text`	Invitation to answer	Must contain "you" or "your"
`reveal_correct`	Celebration when they know it	1-3 sentences, teaches something extra
`reveal_wrong`	Kind teaching when they don't	1-3 sentences, includes correct answer
`correct_answer`	Rich narrative for streaming display	3-6 sentences, storytelling payoff

Prompt Assembly

The challenge generation prompt is a 10-layer composition built by buildSystemPrompt():

Voice constitution (universal Eko tone)
Taxonomy voice (domain-specific register and energy)
Domain vocabulary (expert language patterns)
Format voice (format-specific posture)
Format rules (setup/challenge/reveal refinements)
Style voice (per-style interaction mechanics)
Style rules (what each field should contain)
Taxonomy content rules (formatting conventions)
Difficulty guidance (1-5 scale calibration)
Generation instructions

All layers are driven by TypeScript data files in packages/ai/src/config/ (e.g., challenge-voice.ts, taxonomy-rules-data.ts). This means voice and style behavior can be changed by editing typed as const satisfies arrays without touching runtime logic.

Quality Enforcement

Content is validated at generation time via:

CQ rules (CQ-001 through CQ-013) — banned patterns, field length requirements, structural checks
Drift coordinators (5 pluggable validators) — voice, structure, schema, taxonomy, difficulty compliance
Patching — automated fixes for common violations (e.g., patchCq002() ensures "you"/"your" in challenge text)

Micro-Batching

Challenge content generation uses micro-batching to amortize the ~5,200-token system prompt across multiple facts. The worker (consumeChallengeBatch()) accumulates up to 5 queue messages over a 500ms window, then makes a single AI call per batch. Individual messages are ack'd or nack'd based on per-message success.

Post-Processing Pipeline

After AI generation, each challenge passes through automated patching:

patchPassiveVoice() — rewrites passive constructions to active voice
patchTextbookRegister() — eliminates academic/formal register
patchCq002() — ensures "you"/"your" in challenge_text
patchPunctuationSpacing() — fixes spacing issues
patchGenericReveals() — deduplicates generic reveal text across a batch

Model & Cost

Aspect	Detail
Model tier	`default` (cheaper than mid — challenge gen is higher volume)
Cost per fact	~$0.006
Storage	Upserted into `fact_challenge_content` with unique constraint on `(fact_record_id, challenge_style, target_fact_key, difficulty)`

Feed Selection

Once an evergreen fact has status: 'validated' and challenge content rows, it enters the feed.

Blended Algorithm

For authenticated users, the feed blends four content streams:

Stream	Weight	Query
Recent validated	40%	`getPublishedFacts()` — `published_at DESC`
Review-due	30%	`getReviewDueFacts()` — `nextReviewAt <= NOW()`, streak < 5
Evergreen	20%	`getEvergreenFacts()` — `source_type='ai_generated'`, `RANDOM()`
Exploration	10%	`getRandomFacts()` — any validated fact, `RANDOM()`

Evergreen facts get their own dedicated 20% slice, queried randomly. They are also eligible to appear in the 10% exploration slice. Unauthenticated users see a chronological feed only.

Card Detail

When a user opens a card, all pre-generated challenges for that fact are loaded via getChallengeContentForFact(). The UI presents 4 tabs (Learn, Quiz, Recall, Challenge), each filtering challenges by challengeStyle. Within each tab, one challenge is randomly selected from the pool.

Cost Model

Component	Per-Unit	Daily (at 20 quota)	Monthly
Evergreen generation	~$0.01/fact	~$0.20	~$6
Validation (phases 3-4)	~$0.003/fact	~$0.06	~$2
Challenge content	~$0.006/fact	~$0.12	~$4
Total		~$0.38	~$12

Image resolution is free (Wikipedia, TheSportsDB, Unsplash, Pexels all have free tiers).

Key Differences from News Pipeline

Aspect	News Pipeline	Evergreen Pipeline
Trigger	External news APIs every 15 min	Cron daily at 3 AM UTC
Source type	`news_extraction`	`ai_generated`
Expiry	30 days (auto-archive)	None (permanent)
Validation strategy	`multi_source` (confidence from source count)	`multi_phase` (stricter, no sources to corroborate)
AI model tier	Default (high volume)	Mid (lower volume, higher quality)
Deduplication	Content hash on `news_sources`	Title match on `fact_records` per topic
Feed weight	40% (recent) + 10% (explore)	20% (dedicated evergreen slice)

Key Files

File	Purpose
`apps/web/app/api/cron/generate-evergreen/route.ts`	Cron trigger: quota distribution and queue dispatch
`apps/worker-facts/src/handlers/generate-evergreen.ts`	Worker handler: AI generation, insertion, validation enqueue
`apps/worker-facts/src/handlers/generate-challenge-content.ts`	Worker handler: challenge generation and upsert
`apps/worker-validate/src/handlers/validate-fact.ts`	Post-validation fan-out (RESOLVE_IMAGE + GENERATE_CHALLENGE_CONTENT)
`packages/ai/src/fact-engine.ts`	`generateEvergreenFacts()` function
`packages/ai/src/challenge-content.ts`	`generateChallengeContent()` and `buildSystemPrompt()`
`packages/shared/src/schemas.ts`	Zod schemas for `GenerateEvergreenMessage` and `GenerateChallengeContentMessage`
`packages/queue/src/index.ts`	Queue client, message constructors
`packages/db/src/drizzle/fact-engine-queries.ts`	Feed queries: `getEvergreenFacts()`, `getPublishedFacts()`
`packages/config/src/index.ts`	`EVERGREEN_ENABLED`, `EVERGREEN_DAILY_QUOTA` config

News-to-Challenge Ingestion Guide — Step-by-step walkthrough of the news pipeline
Fact-Challenge Anatomy — How facts become challenges (6 concepts, 5 layers)
News & Fact Engine — System reference (providers, costs, config)
Challenge Content Rules — Quality rules CQ-001 through CQ-013
APP-CONTROL.md — Operational manifest (crons, workers, queues)

#Evergreen Fact & Challenge Ingestion

#What It Does

#Pipeline Overview

#Evergreen Generation

#Trigger

#Quota Distribution

#AI Generation

#Fact Insertion

#Validation

#Challenge Content Generation

#Trigger

#What Gets Generated

#The 5-Layer Structure

#Prompt Assembly

#Quality Enforcement

#Micro-Batching

#Post-Processing Pipeline

#Model & Cost

#Feed Selection

#Blended Algorithm

#Card Detail

#Cost Model

#Key Differences from News Pipeline

#Key Files

#Related

Evergreen Fact & Challenge Ingestion

What It Does

Pipeline Overview

Evergreen Generation

Trigger

Quota Distribution

AI Generation

Fact Insertion

Validation

Challenge Content Generation

Trigger

What Gets Generated

The 5-Layer Structure

Prompt Assembly

Quality Enforcement

Micro-Batching

Post-Processing Pipeline

Model & Cost

Feed Selection

Blended Algorithm

Card Detail

Cost Model

Key Differences from News Pipeline

Key Files

Related