Evergreen Pipeline — Detailed Flow
How Eko generates timeless knowledge facts via AI — content that never expires and provides a stable baseline when news is slow.
Overview
The evergreen pipeline produces "always-relevant" facts that aren't tied to current events. Unlike news facts (30-day expiry) or seed facts (manual trigger), evergreen facts are generated on a daily cron schedule across all active topic categories.
Key characteristics:
- AI-generated: No external data source — the AI generates facts from its training data, optionally grounded by enrichment APIs
- Never expires:
expires_at: NULL(permanent content) - Stricter validation: Uses
multi_phasestrategy because there are no independent news sources to corroborate - Currently gated off:
EVERGREEN_ENABLED=falsein production; can be triggered manually from admin
End-to-End Flow
Vercel Cron (daily, 3 AM UTC)
or Admin manual trigger
│
▼
┌─ STEP 1: QUOTA DISTRIBUTION ─────────────────────────────────────┐
│ Check EVERGREEN_ENABLED gate (default: false) │
│ Load active topic categories with schemas │
│ Distribute EVERGREEN_DAILY_QUOTA across categories │
│ Enqueue 1 GENERATE_EVERGREEN per category │
└──────────────────────────────────────────────────┬────────────────┘
│
▼
┌─ STEP 2: AI GENERATION ──────────────────────────────────────────┐
│ Worker: worker-facts │
│ Handler: generate-evergreen.ts → generateEvergreenFacts() │
│ │
│ For each topic category: │
│ - Load schema keys │
│ - Load existing titles (for deduplication) │
│ - Resolve enrichment context (optional, never blocks) │
│ - Compute deterministic notability (if KG + Wikidata confirm) │
│ - AI generates N facts │
└──────────────────────────────────────────────────┬────────────────┘
│
▼
┌─ STEP 3: INSERT + ENQUEUE ───────────────────────────────────────┐
│ Per generated record: │
│ - Insert fact_record (source_type: ai_generated, no expiry) │
│ - Content hash collision check (skip duplicates) │
│ - Enqueue VALIDATE_FACT (strategy: multi_phase) │
└──────────────────────────────────────────────────┬────────────────┘
│
▼
┌─ STEP 4: VALIDATION ────────────────────────────────────────────┐
│ 4-phase: structural → consistency → cross-model → evidence │
│ Stricter than news (no independent sources) │
│ Pass: confidence ≥ 0.7 + no critical flags │
└──────────────────────┬──────────────────────┬────────────────────┘
│ │
INVALID VALID
│ │
▼ ▼
rejected ┌─ POST-VALIDATION FAN-OUT ─────┐
│ RESOLVE_IMAGE (parallel) │
│ GENERATE_CHALLENGE_CONTENT │
└────────────────┬───────────────┘
│
▼
Fact appears in feed
(20% evergreen slice)
Step 1: Quota Distribution
Trigger
The cron route at apps/web/app/api/cron/generate-evergreen/route.ts fires daily at 3 AM UTC. It checks two gates:
| Gate | Config | Default | Effect |
|---|---|---|---|
| Feature flag | EVERGREEN_ENABLED | false | Blocks all generation if false |
| Daily quota | EVERGREEN_DAILY_QUOTA | 20 | Max facts generated per day |
Category Allocation
The daily quota is distributed across active root-level topic categories:
- If
percentTargetis set on a category: getsdailyQuota × percentTargetfacts- Example: 20 daily quota, "Science" has 15% target → 3 facts
- If not set: quota split evenly across all categories
- Floor: Each category gets at least 1 fact
- Skip: Categories without schemas are excluded
One GENERATE_EVERGREEN queue message per category:
GENERATE_EVERGREEN {
topic_category_id: "uuid",
schema_id: "uuid",
count: 3
}
Production Status
The cron route exists but is not in vercel.json — it's unscheduled. Generation can be triggered:
- Manually from the admin dashboard (Pipeline → Evergreen → Generate)
- By adding the route to
vercel.jsoncrons and settingEVERGREEN_ENABLED=true
Step 2: AI Generation
What the Handler Does
apps/worker-facts/src/handlers/generate-evergreen.ts:
- Load the topic category and its schema keys
- Load existing fact titles for the topic (up to 50) — passed to AI for deduplication
- Optionally resolve enrichment context (same orchestrator as news/seed pipelines)
- Optionally compute deterministic notability from KG + Wikidata signals
- Call
generateEvergreenFacts()from@eko/ai
AI Prompt Composition
The generation prompt includes:
| Layer | Source | Purpose |
|---|---|---|
| CHALLENGE_TONE_PREFIX | challenge-content-rules.ts | Theatrical title requirements |
| Taxonomy voice | resolveVoice() | Domain-specific register and energy |
| Domain vocabulary | formatVocabularyForPrompt() | Expert language patterns |
| Taxonomy content rules | resolveContentRules() | Formatting conventions |
| Schema keys | fact_record_schemas.fact_keys | Required JSONB fields |
| Existing titles | DB query | Deduplication list |
| Enrichment context | resolveEnrichmentContext() | Grounding data from APIs |
| Subcategory hierarchy | DB query | Rendered as tree for AI classification |
Enrichment Sources
Same as other pipelines — always queried in parallel, never blocks:
| Always | Topic-Routed |
|---|---|
| Google Knowledge Graph | TheSportsDB (sports/*) |
| Wikidata | MusicBrainz (music/*) |
| Wikipedia | Nominatim (geography/*) |
| Open Library (books/*) |
Model Selection
| Aspect | Detail |
|---|---|
| Task name | evergreen_generation |
| Default tier | mid (lower volume than news, higher quality bar) |
| Model routing | DB-driven via ai_model_tier_config |
| Model-specific tuning | Per-model ModelAdapter injects prompt customizations |
| Can escalate | Yes, if escalation signals present |
Output per Record
| Field | Description |
|---|---|
title | Factual, Wikipedia-style label |
challengeTitle | Theatrical, curiosity-provoking hook |
facts | Structured JSONB conforming to topic schema |
context | 4-8 sentence narrative (Hook → Story → Connection) |
notabilityScore | 0.0-1.0 (may be overridden by deterministic scoring) |
notabilityReason | One-sentence justification |
Deduplication
Two layers prevent duplicate facts:
- Title dedup: AI receives existing titles and is instructed to avoid repetition
- Content hash: On insert, a content hash collision check skips exact duplicates
Step 3: Insertion
Each generated record is inserted into fact_records with:
| Field | Value |
|---|---|
status | pending_validation |
source_type | ai_generated |
expires_at | NULL (permanent) |
generation_cost_usd | Total AI cost ÷ number of records generated |
notability_score | AI or deterministic override |
A VALIDATE_FACT message is immediately enqueued with the multi_phase strategy.
Step 4: Validation
Why Multi-Phase?
Evergreen facts use the strictest validation strategy because:
- No independent news sources exist to corroborate
- The AI is generating from training data, which may contain errors
- Enrichment data from Step 2 is not reused in validation — the validator independently queries external APIs to avoid circular reasoning
4-Phase Pipeline
| Phase | Name | What It Checks | Cost |
|---|---|---|---|
| 1 | Structural | Schema conformance, type validation, injection detection | $0 (code-only) |
| 2 | Consistency | Internal contradictions, taxonomy rule violations | $0 (code-only) |
| 3 | Cross-Model | AI adversarial verification (different model than generator) | ~$0.001 |
| 4 | Evidence | External API corroboration (Wikipedia, Wikidata) + AI reasoner | ~$0.002-0.005 |
Phases 1-2 are free code-only checks that catch ~40% of defective facts before any AI call is made.
Pass Criteria
confidence ≥ 0.7- No flags containing "critical"
Post-Validation Fan-Out
On validation success, two independent jobs fire in parallel:
RESOLVE_IMAGE— image cascade (Wikipedia → SportsDB → Unsplash → Pexels)GENERATE_CHALLENGE_CONTENT— 6 quiz styles with 5-layer structure
Feed Integration
Evergreen facts get a dedicated 20% slice in the blended feed algorithm:
| Stream | Weight | Source |
|---|---|---|
| Recent validated | 40% | Newly published facts |
| Review-due | 30% | Spaced repetition (SM-2 variant) |
| Evergreen | 20% | source_type='ai_generated', RANDOM() |
| Exploration | 10% | Random facts for discovery |
Evergreen facts are also eligible to appear in the 10% exploration slice. They never expire, so they accumulate over time and provide a growing content baseline.
Real-World Example: Generating "Space & Astronomy" Facts
Step 1: Quota
- Daily quota: 20 facts
- 10 active categories, "Space & Astronomy" has no percentTarget
- Even distribution: 2 facts per category
- Enqueue:
GENERATE_EVERGREEN(topic: "Space & Astronomy", count: 2)
Step 2: AI Generation
- Schema keys for Space:
celestial_body,measurement,distance,significance,date - Existing titles loaded (50): ["Mars' Olympus Mons...", "Light from the Sun...", ...]
- Enrichment: Wikipedia summary for "astronomy", Wikidata structured data
- AI generates 2 facts:
Fact 1:
title: "Saturn's Ring System Spans 282,000 Kilometers"
challengeTitle: "The Solar System's Most Spectacular Jewelry"
facts: {
celestial_body: "Saturn",
measurement: "282,000 km ring diameter",
significance: "Largest ring system in the solar system"
}
context: "Saturn's rings stretch across a distance that would span most of
the way from Earth to the Moon. Despite their enormous width, the rings are
remarkably thin — averaging just 10 meters thick. First observed by Galileo
in 1610, he described them as 'ears' because his telescope couldn't resolve
their true shape..."
notabilityScore: 0.85
Fact 2:
title: "Neutron Stars Can Spin at 716 Rotations Per Second"
challengeTitle: "The Universe's Fastest Spinning Objects"
facts: {
celestial_body: "PSR J1748-2446ad",
measurement: "716 Hz rotation frequency",
significance: "Fastest known spinning neutron star"
}
context: "The fastest known pulsar, PSR J1748-2446ad, completes 716 full
rotations every second — meaning its equatorial surface moves at nearly
a quarter the speed of light..."
notabilityScore: 0.88
Step 3: Insertion
- 2
fact_recordsinserted (source_type:ai_generated, expires_at: NULL) - 2
VALIDATE_FACTmessages enqueued (multi_phase strategy)
Step 4: Validation
- Fact 1: Wikipedia confirms Saturn ring diameter → confidence 0.85 → validated
- Fact 2: Wikipedia confirms PSR J1748-2446ad rotation rate → confidence 0.82 → validated
- Fan-out: 2 ×
RESOLVE_IMAGE+ 2 ×GENERATE_CHALLENGE_CONTENT
Result
- Saturn fact: Wikipedia image of Saturn's rings resolved
- Neutron star fact: Unsplash fallback (abstract space photo)
- 12 challenge variants generated (6 per fact)
- Both appear in the 20% evergreen feed slice
Comparison with Other Pipelines
| Aspect | News | Evergreen | Seed |
|---|---|---|---|
| Trigger | Cron every 15 min | Cron daily / manual | Manual seed entry |
| Source type | news_extraction | ai_generated | file_seed |
| Data source | External news APIs | AI generation + enrichment | AI explosion + enrichment |
| Expiry | 30 days | None (permanent) | None (permanent) |
| Validation | multi_source | multi_phase (strictest) | multi_phase (strictest) |
| AI model tier | mid | mid | default (bulk) |
| Volume | Continuous (every 15 min) | Low (20/day default) | Batch (10-100 per entity) |
| Feed weight | 40% recent + 10% explore | 20% dedicated slice | Mixed into all streams |
| Dedup method | Content hash on news_sources | Title match per topic | Title match per topic |
| Feature gate | Always on | EVERGREEN_ENABLED | Manual trigger |
Cost Model
| Component | Per Fact | Daily (20 quota) | Monthly |
|---|---|---|---|
| AI generation | ~$0.01 | ~$0.20 | ~$6 |
| Validation (phases 3-4) | ~$0.003 | ~$0.06 | ~$2 |
| Challenge content | ~$0.006 | ~$0.12 | ~$4 |
| Enrichment APIs | $0 | $0 | $0 |
| Image resolution | $0 | $0 | $0 |
| Total | ~$0.02 | ~$0.38 | ~$12 |
Configuration
| Variable | Default | Description |
|---|---|---|
EVERGREEN_ENABLED | false | Master gate for evergreen generation |
EVERGREEN_DAILY_QUOTA | 20 | Max facts per day across all categories |
NOTABILITY_THRESHOLD | 0.6 | Minimum notability score for insertion |
Key Files
| File | Purpose |
|---|---|
apps/web/app/api/cron/generate-evergreen/route.ts | Cron trigger, quota distribution |
apps/worker-facts/src/handlers/generate-evergreen.ts | Worker handler, AI generation |
packages/ai/src/fact-engine.ts | generateEvergreenFacts() function |
packages/ai/src/enrichment.ts | Enrichment orchestrator (8 free APIs) |
packages/config/src/index.ts | EVERGREEN_ENABLED, EVERGREEN_DAILY_QUOTA |
packages/shared/src/schemas.ts | GenerateEvergreenMessage Zod schema |
packages/db/src/drizzle/fact-engine-queries.ts | getEvergreenFacts(), getActiveTopicCategoriesWithSchemas() |
apps/admin/app/(dashboard)/pipeline/actions.ts | Manual trigger: triggerGenerateEvergreen() |
Related
- Evergreen Fact & Challenge Ingestion — Existing reference doc (challenge content focus)
- Fact Ingestion — Source of Truth Map — SOT references for all three pipelines
- News Pipeline — Current events ingestion
- Seeding Pipeline — Entity explosion and bootstrapping
- Fact-Challenge Anatomy — How facts become challenges