Architecture Overview

Eko is a knowledge platform that processes three content pipelines — news, evergreen, and seed — into verified, structured knowledge cards. Users learn through interactive challenges powered by spaced repetition.

Core Invariants

InvariantDescription
Fact-firstFacts are the atomic unit; everything flows from structured, schema-validated fact_records
Verification before publicationNo fact reaches the public feed without at least one validation tier pass
Source attributionEvery fact traces back to source articles and validation evidence
Schema conformanceFact output validates against fact_record_schemas.fact_keys per topic category
Cost-bounded AIAll AI calls go through a model router with tier selection, daily spend caps, and per-call cost tracking
Public feed / gated detailFeed is public; full card detail and interactions require Free/Eko+ subscription
Topic balanceDaily quotas per topic category prevent content monoculture
Challenge answer isolationChallenge URLs, shared links, and metadata never reveal answer content

Data Flow

News APIs ─┐
           ├──▶ [INGEST_NEWS] ──▶ worker-ingest ──▶ news_sources table
           │                                              │
           │                        ┌─────────────────────┘
           │                        ▼
           │              [CLUSTER_STORIES] ──▶ worker-ingest ──▶ stories
           │                                              │
           │                        ┌─────────────────────┘
           │                        ▼
           │              [EXTRACT_FACTS] ──▶ worker-facts ──▶ fact_records
           │                                              │
           │                  ┌───────────┬───────────────┘
           │                  ▼           ▼
           │         [VALIDATE_FACT]  [RESOLVE_IMAGE]
           │              │               │
           │              ▼               ▼
           │         worker-validate  worker-ingest
           │              │               │
           │              ▼               ▼
           │         fact verified    image cached
           │
Seed Data ─┤
           ├──▶ [EXPLODE_CATEGORY_ENTRY] ──▶ worker-facts ──▶ fact_records
           ├──▶ [FIND_SUPER_FACTS] ──▶ worker-facts ──▶ cross-correlations
           └──▶ [GENERATE_CHALLENGE_CONTENT] ──▶ worker-facts ──▶ fact_challenge_content
                                                                        │
                                        ┌───────────────────────────────┘
                                        ▼ (if image would spoil answer)
                              [RESOLVE_CHALLENGE_IMAGE] ──▶ worker-ingest ──▶ anti-spoiler image

Evergreen ────▶ [GENERATE_EVERGREEN] ──▶ worker-facts ──▶ fact_records

Database Schema

PostgreSQL via Supabase with Drizzle ORM. 32+ active tables across 6 groups. See Fact & Taxonomy Schema Map for the full reference.

Taxonomy

TablePurpose
topic_categoriesHierarchical taxonomy tree (depth 0-3, slug, path, dailyQuota). Supports deprecation via deprecated_at, replaced_by.
topic_category_aliasesMaps external provider slugs to canonical topic category IDs
unmapped_category_logLogs unresolved category slugs for audit

Fact Storage

TablePurpose
fact_recordsCore fact storage — facts JSONB with key-value pairs, title, context, notabilityScore, status, validation
fact_record_schemasPer-topic fact shape definitions — factKeys JSONB, cardFormats array

Challenge System

TablePurpose
fact_challenge_contentPre-generated challenge text per style/difficulty with challenge_title
challenge_formats8 named formats with knowledgeType and tone
challenge_format_stylesFormat → eligible style mapping
challenge_format_topicsFormat → eligible topic mapping
challenge_sessionsMulti-turn conversational challenge state with conversation history
card_interactionsUser engagement tracking (views, answers, bookmarks, shares)
challenge_group_progressPer-position progress within challenge groups
card_bookmarksUser bookmarked cards
score_disputesUser score dispute submissions for AI re-evaluation

Ingestion & Observability

TablePurpose
storiesClustered news articles grouped by TF-IDF cosine similarity
news_sourcesRaw articles fetched from news APIs
ai_cost_trackingPer-model, per-feature daily cost aggregation
ingestion_runsPipeline observability — tracks each cron invocation
content_operations_logOperational event log for pipeline debugging

Brand & Domain

TablePurpose
brandsCanonical brand entities
brand_categoriesBrand categorization taxonomy
brand_category_assignmentsBrand → category mappings
domainsDomain records with verification status

User & Subscription

TablePurpose
profilesUser profiles synced from Supabase Auth
plan_definitionsPlan configuration (free, base, pro, team, plus)
user_subscriptionsMaps users to plans with Stripe integration
notification_preferencesPer-user notification settings
system_notificationsSystem-generated notifications
feature_flagsRuntime feature flag toggles
ai_model_tier_configAI model routing configuration per tier
reward_milestonesAchievement milestones for gamification
user_reward_claimsClaimed milestone rewards
user_quality_gradesPer-topic quality grade aggregation

Seed Pipeline

TablePurpose
seed_entry_queueBatch seed entry processing queue
super_fact_linksCross-entry correlation links
seed_entry_linksEntry-to-fact provenance links

Queue System

Backend: Upstash Redis (REST API). 2-minute lease duration, max 3 attempts before dead-letter queue, exponential backoff with jitter.

Message Types

QueueConsumerTrigger
INGEST_NEWSworker-ingestcron/ingest-news (every 15m)
CLUSTER_STORIESworker-ingestcron/cluster-sweep (hourly)
RESOLVE_IMAGEworker-ingestpost-extraction (automatic)
RESOLVE_CHALLENGE_IMAGEworker-ingestpost-challenge-generation (automatic)
EXTRACT_FACTSworker-factspost-clustering (automatic)
IMPORT_FACTSworker-factscron/import-facts (stub)
GENERATE_EVERGREENworker-factscron/generate-evergreen (daily)
EXPLODE_CATEGORY_ENTRYworker-factsseed pipeline (manual/batch)
FIND_SUPER_FACTSworker-factsseed pipeline (manual/batch)
GENERATE_CHALLENGE_CONTENTworker-factspost-extraction or seed pipeline
VALIDATE_FACTworker-validatepost-extraction + cron/validation-retry (every 4h)
SEND_SMSworker-smsSMS delivery via Twilio

Workers

Five Bun-based queue consumer processes. Each exposes /health and implements graceful shutdown.

WorkerQueues ConsumedPurpose
worker-ingestINGEST_NEWS, CLUSTER_STORIES, RESOLVE_IMAGE, RESOLVE_CHALLENGE_IMAGENews fetch, article clustering, image resolution
worker-factsEXTRACT_FACTS, IMPORT_FACTS, GENERATE_EVERGREEN, EXPLODE_CATEGORY_ENTRY, FIND_SUPER_FACTS, GENERATE_CHALLENGE_CONTENTFact extraction, generation, challenges
worker-validateVALIDATE_FACT4-phase verification (structural → consistency → cross-model → evidence)
worker-reel-render(R2/internal triggers)Render reel videos from fact records
worker-smsSEND_SMSSMS notifications via Twilio

AI Model Router

All AI calls go through a tier-based model router with 27 models across 6 providers.

ProviderModels
OpenAIgpt-5.4-nano, gpt-5.4-mini, gpt-5-nano, gpt-5-mini, gpt-4o-mini, gpt-4o, gpt-4-turbo, gpt-4, gpt-3.5-turbo
Anthropicclaude-haiku-4-5, claude-sonnet-4-5, claude-opus-4-5, claude-opus-4-6
Googlegemini-2.0-flash, gemini-2.0-flash-lite, gemini-2.5-pro
MiniMaxMiniMax-M2, MiniMax-M2.1, MiniMax-M2.5
xAIgrok-4-1-fast-reasoning, grok-4-1-fast-non-reasoning, grok-4

Cost tracked via ai_cost_tracking table with daily spend caps (ANTHROPIC_DAILY_SPEND_CAP_USD).

Legacy (V1)

The original V1 URL change tracking system and its vNext global URL library have been fully superseded by the V2 fact engine. 41 V1 legacy tables were dropped in migrations 0057-0060. V1 architecture decisions (ADR-001 through ADR-012) are preserved in decisions.md for historical reference. V1 documentation is archived in docs_archive/.

Dependency Exceptions

All dependencies use @latest unless noted here:

(No exceptions currently)