Evergreen Pipeline — Detailed Flow

How Eko generates timeless knowledge facts via AI — content that never expires and provides a stable baseline when news is slow.

Overview

The evergreen pipeline produces "always-relevant" facts that aren't tied to current events. Unlike news facts (30-day expiry) or seed facts (manual trigger), evergreen facts are generated on a daily cron schedule across all active topic categories.

Key characteristics:

  • AI-generated: No external data source — the AI generates facts from its training data, optionally grounded by enrichment APIs
  • Never expires: expires_at: NULL (permanent content)
  • Stricter validation: Uses multi_phase strategy because there are no independent news sources to corroborate
  • Currently gated off: EVERGREEN_ENABLED=false in production; can be triggered manually from admin

End-to-End Flow

  Vercel Cron (daily, 3 AM UTC)
  or Admin manual trigger
         │
         ▼
  ┌─ STEP 1: QUOTA DISTRIBUTION ─────────────────────────────────────┐
  │  Check EVERGREEN_ENABLED gate (default: false)                    │
  │  Load active topic categories with schemas                        │
  │  Distribute EVERGREEN_DAILY_QUOTA across categories               │
  │  Enqueue 1 GENERATE_EVERGREEN per category                        │
  └──────────────────────────────────────────────────┬────────────────┘
         │
         ▼
  ┌─ STEP 2: AI GENERATION ──────────────────────────────────────────┐
  │  Worker: worker-facts                                             │
  │  Handler: generate-evergreen.ts → generateEvergreenFacts()        │
  │                                                                   │
  │  For each topic category:                                         │
  │  - Load schema keys                                               │
  │  - Load existing titles (for deduplication)                       │
  │  - Resolve enrichment context (optional, never blocks)            │
  │  - Compute deterministic notability (if KG + Wikidata confirm)    │
  │  - AI generates N facts                                           │
  └──────────────────────────────────────────────────┬────────────────┘
         │
         ▼
  ┌─ STEP 3: INSERT + ENQUEUE ───────────────────────────────────────┐
  │  Per generated record:                                            │
  │  - Insert fact_record (source_type: ai_generated, no expiry)      │
  │  - Content hash collision check (skip duplicates)                 │
  │  - Enqueue VALIDATE_FACT (strategy: multi_phase)                  │
  └──────────────────────────────────────────────────┬────────────────┘
         │
         ▼
  ┌─ STEP 4: VALIDATION ────────────────────────────────────────────┐
  │  4-phase: structural → consistency → cross-model → evidence      │
  │  Stricter than news (no independent sources)                      │
  │  Pass: confidence ≥ 0.7 + no critical flags                      │
  └──────────────────────┬──────────────────────┬────────────────────┘
                         │                      │
                      INVALID                VALID
                         │                      │
                         ▼                      ▼
                      rejected       ┌─ POST-VALIDATION FAN-OUT ─────┐
                                     │  RESOLVE_IMAGE (parallel)      │
                                     │  GENERATE_CHALLENGE_CONTENT    │
                                     └────────────────┬───────────────┘
                                                      │
                                                      ▼
                                              Fact appears in feed
                                           (20% evergreen slice)

Step 1: Quota Distribution

Trigger

The cron route at apps/web/app/api/cron/generate-evergreen/route.ts fires daily at 3 AM UTC. It checks two gates:

GateConfigDefaultEffect
Feature flagEVERGREEN_ENABLEDfalseBlocks all generation if false
Daily quotaEVERGREEN_DAILY_QUOTA20Max facts generated per day

Category Allocation

The daily quota is distributed across active root-level topic categories:

  • If percentTarget is set on a category: gets dailyQuota × percentTarget facts
    • Example: 20 daily quota, "Science" has 15% target → 3 facts
  • If not set: quota split evenly across all categories
  • Floor: Each category gets at least 1 fact
  • Skip: Categories without schemas are excluded

One GENERATE_EVERGREEN queue message per category:

GENERATE_EVERGREEN {
  topic_category_id: "uuid",
  schema_id: "uuid",
  count: 3
}

Production Status

The cron route exists but is not in vercel.json — it's unscheduled. Generation can be triggered:

  1. Manually from the admin dashboard (Pipeline → Evergreen → Generate)
  2. By adding the route to vercel.json crons and setting EVERGREEN_ENABLED=true

Step 2: AI Generation

What the Handler Does

apps/worker-facts/src/handlers/generate-evergreen.ts:

  1. Load the topic category and its schema keys
  2. Load existing fact titles for the topic (up to 50) — passed to AI for deduplication
  3. Optionally resolve enrichment context (same orchestrator as news/seed pipelines)
  4. Optionally compute deterministic notability from KG + Wikidata signals
  5. Call generateEvergreenFacts() from @eko/ai

AI Prompt Composition

The generation prompt includes:

LayerSourcePurpose
CHALLENGE_TONE_PREFIXchallenge-content-rules.tsTheatrical title requirements
Taxonomy voiceresolveVoice()Domain-specific register and energy
Domain vocabularyformatVocabularyForPrompt()Expert language patterns
Taxonomy content rulesresolveContentRules()Formatting conventions
Schema keysfact_record_schemas.fact_keysRequired JSONB fields
Existing titlesDB queryDeduplication list
Enrichment contextresolveEnrichmentContext()Grounding data from APIs
Subcategory hierarchyDB queryRendered as tree for AI classification

Enrichment Sources

Same as other pipelines — always queried in parallel, never blocks:

AlwaysTopic-Routed
Google Knowledge GraphTheSportsDB (sports/*)
WikidataMusicBrainz (music/*)
WikipediaNominatim (geography/*)
Open Library (books/*)

Model Selection

AspectDetail
Task nameevergreen_generation
Default tiermid (lower volume than news, higher quality bar)
Model routingDB-driven via ai_model_tier_config
Model-specific tuningPer-model ModelAdapter injects prompt customizations
Can escalateYes, if escalation signals present

Output per Record

FieldDescription
titleFactual, Wikipedia-style label
challengeTitleTheatrical, curiosity-provoking hook
factsStructured JSONB conforming to topic schema
context4-8 sentence narrative (Hook → Story → Connection)
notabilityScore0.0-1.0 (may be overridden by deterministic scoring)
notabilityReasonOne-sentence justification

Deduplication

Two layers prevent duplicate facts:

  1. Title dedup: AI receives existing titles and is instructed to avoid repetition
  2. Content hash: On insert, a content hash collision check skips exact duplicates

Step 3: Insertion

Each generated record is inserted into fact_records with:

FieldValue
statuspending_validation
source_typeai_generated
expires_atNULL (permanent)
generation_cost_usdTotal AI cost ÷ number of records generated
notability_scoreAI or deterministic override

A VALIDATE_FACT message is immediately enqueued with the multi_phase strategy.


Step 4: Validation

Why Multi-Phase?

Evergreen facts use the strictest validation strategy because:

  • No independent news sources exist to corroborate
  • The AI is generating from training data, which may contain errors
  • Enrichment data from Step 2 is not reused in validation — the validator independently queries external APIs to avoid circular reasoning

4-Phase Pipeline

PhaseNameWhat It ChecksCost
1StructuralSchema conformance, type validation, injection detection$0 (code-only)
2ConsistencyInternal contradictions, taxonomy rule violations$0 (code-only)
3Cross-ModelAI adversarial verification (different model than generator)~$0.001
4EvidenceExternal API corroboration (Wikipedia, Wikidata) + AI reasoner~$0.002-0.005

Phases 1-2 are free code-only checks that catch ~40% of defective facts before any AI call is made.

Pass Criteria

  • confidence ≥ 0.7
  • No flags containing "critical"

Post-Validation Fan-Out

On validation success, two independent jobs fire in parallel:

  1. RESOLVE_IMAGE — image cascade (Wikipedia → SportsDB → Unsplash → Pexels)
  2. GENERATE_CHALLENGE_CONTENT — 6 quiz styles with 5-layer structure

Feed Integration

Evergreen facts get a dedicated 20% slice in the blended feed algorithm:

StreamWeightSource
Recent validated40%Newly published facts
Review-due30%Spaced repetition (SM-2 variant)
Evergreen20%source_type='ai_generated', RANDOM()
Exploration10%Random facts for discovery

Evergreen facts are also eligible to appear in the 10% exploration slice. They never expire, so they accumulate over time and provide a growing content baseline.


Real-World Example: Generating "Space & Astronomy" Facts

Step 1: Quota

  • Daily quota: 20 facts
  • 10 active categories, "Space & Astronomy" has no percentTarget
  • Even distribution: 2 facts per category
  • Enqueue: GENERATE_EVERGREEN(topic: "Space & Astronomy", count: 2)

Step 2: AI Generation

  • Schema keys for Space: celestial_body, measurement, distance, significance, date
  • Existing titles loaded (50): ["Mars' Olympus Mons...", "Light from the Sun...", ...]
  • Enrichment: Wikipedia summary for "astronomy", Wikidata structured data
  • AI generates 2 facts:

Fact 1:

title: "Saturn's Ring System Spans 282,000 Kilometers"
challengeTitle: "The Solar System's Most Spectacular Jewelry"
facts: {
  celestial_body: "Saturn",
  measurement: "282,000 km ring diameter",
  significance: "Largest ring system in the solar system"
}
context: "Saturn's rings stretch across a distance that would span most of
the way from Earth to the Moon. Despite their enormous width, the rings are
remarkably thin — averaging just 10 meters thick. First observed by Galileo
in 1610, he described them as 'ears' because his telescope couldn't resolve
their true shape..."
notabilityScore: 0.85

Fact 2:

title: "Neutron Stars Can Spin at 716 Rotations Per Second"
challengeTitle: "The Universe's Fastest Spinning Objects"
facts: {
  celestial_body: "PSR J1748-2446ad",
  measurement: "716 Hz rotation frequency",
  significance: "Fastest known spinning neutron star"
}
context: "The fastest known pulsar, PSR J1748-2446ad, completes 716 full
rotations every second — meaning its equatorial surface moves at nearly
a quarter the speed of light..."
notabilityScore: 0.88

Step 3: Insertion

  • 2 fact_records inserted (source_type: ai_generated, expires_at: NULL)
  • 2 VALIDATE_FACT messages enqueued (multi_phase strategy)

Step 4: Validation

  • Fact 1: Wikipedia confirms Saturn ring diameter → confidence 0.85 → validated
  • Fact 2: Wikipedia confirms PSR J1748-2446ad rotation rate → confidence 0.82 → validated
  • Fan-out: 2 × RESOLVE_IMAGE + 2 × GENERATE_CHALLENGE_CONTENT

Result

  • Saturn fact: Wikipedia image of Saturn's rings resolved
  • Neutron star fact: Unsplash fallback (abstract space photo)
  • 12 challenge variants generated (6 per fact)
  • Both appear in the 20% evergreen feed slice

Comparison with Other Pipelines

AspectNewsEvergreenSeed
TriggerCron every 15 minCron daily / manualManual seed entry
Source typenews_extractionai_generatedfile_seed
Data sourceExternal news APIsAI generation + enrichmentAI explosion + enrichment
Expiry30 daysNone (permanent)None (permanent)
Validationmulti_sourcemulti_phase (strictest)multi_phase (strictest)
AI model tiermidmiddefault (bulk)
VolumeContinuous (every 15 min)Low (20/day default)Batch (10-100 per entity)
Feed weight40% recent + 10% explore20% dedicated sliceMixed into all streams
Dedup methodContent hash on news_sourcesTitle match per topicTitle match per topic
Feature gateAlways onEVERGREEN_ENABLEDManual trigger

Cost Model

ComponentPer FactDaily (20 quota)Monthly
AI generation~$0.01~$0.20~$6
Validation (phases 3-4)~$0.003~$0.06~$2
Challenge content~$0.006~$0.12~$4
Enrichment APIs$0$0$0
Image resolution$0$0$0
Total~$0.02~$0.38~$12

Configuration

VariableDefaultDescription
EVERGREEN_ENABLEDfalseMaster gate for evergreen generation
EVERGREEN_DAILY_QUOTA20Max facts per day across all categories
NOTABILITY_THRESHOLD0.6Minimum notability score for insertion

Key Files

FilePurpose
apps/web/app/api/cron/generate-evergreen/route.tsCron trigger, quota distribution
apps/worker-facts/src/handlers/generate-evergreen.tsWorker handler, AI generation
packages/ai/src/fact-engine.tsgenerateEvergreenFacts() function
packages/ai/src/enrichment.tsEnrichment orchestrator (8 free APIs)
packages/config/src/index.tsEVERGREEN_ENABLED, EVERGREEN_DAILY_QUOTA
packages/shared/src/schemas.tsGenerateEvergreenMessage Zod schema
packages/db/src/drizzle/fact-engine-queries.tsgetEvergreenFacts(), getActiveTopicCategoriesWithSchemas()
apps/admin/app/(dashboard)/pipeline/actions.tsManual trigger: triggerGenerateEvergreen()