Architecture Overview

Eko is a knowledge platform that processes three content pipelines — news, evergreen, and seed — into verified, structured knowledge cards. Users learn through interactive challenges powered by spaced repetition.

Core Invariants

Invariant	Description
Fact-first	Facts are the atomic unit; everything flows from structured, schema-validated `fact_records`
Verification before publication	No fact reaches the public feed without at least one validation tier pass
Source attribution	Every fact traces back to source articles and validation evidence
Schema conformance	Fact output validates against `fact_record_schemas.fact_keys` per topic category
Cost-bounded AI	All AI calls go through a model router with tier selection, daily spend caps, and per-call cost tracking
Public feed / gated detail	Feed is public; full card detail and interactions require Free/Eko+ subscription
Topic balance	Daily quotas per topic category prevent content monoculture
Challenge answer isolation	Challenge URLs, shared links, and metadata never reveal answer content

Data Flow

News APIs ─┐
           ├──▶ [INGEST_NEWS] ──▶ worker-ingest ──▶ news_sources table
           │                                              │
           │                        ┌─────────────────────┘
           │                        ▼
           │              [CLUSTER_STORIES] ──▶ worker-ingest ──▶ stories
           │                                              │
           │                        ┌─────────────────────┘
           │                        ▼
           │              [EXTRACT_FACTS] ──▶ worker-facts ──▶ fact_records
           │                                              │
           │                  ┌───────────┬───────────────┘
           │                  ▼           ▼
           │         [VALIDATE_FACT]  [RESOLVE_IMAGE]
           │              │               │
           │              ▼               ▼
           │         worker-validate  worker-ingest
           │              │               │
           │              ▼               ▼
           │         fact verified    image cached
           │
Seed Data ─┤
           ├──▶ [EXPLODE_CATEGORY_ENTRY] ──▶ worker-facts ──▶ fact_records
           ├──▶ [FIND_SUPER_FACTS] ──▶ worker-facts ──▶ cross-correlations
           └──▶ [GENERATE_CHALLENGE_CONTENT] ──▶ worker-facts ──▶ fact_challenge_content
                                                                        │
                                        ┌───────────────────────────────┘
                                        ▼ (if image would spoil answer)
                              [RESOLVE_CHALLENGE_IMAGE] ──▶ worker-ingest ──▶ anti-spoiler image

Evergreen ────▶ [GENERATE_EVERGREEN] ──▶ worker-facts ──▶ fact_records

Database Schema

PostgreSQL via Supabase with Drizzle ORM. 32+ active tables across 6 groups. See Fact & Taxonomy Schema Map for the full reference.

Taxonomy

Table	Purpose
`topic_categories`	Hierarchical taxonomy tree (depth 0-3, slug, path, dailyQuota). Supports deprecation via `deprecated_at`, `replaced_by`.
`topic_category_aliases`	Maps external provider slugs to canonical topic category IDs
`unmapped_category_log`	Logs unresolved category slugs for audit

Fact Storage

Table	Purpose
`fact_records`	Core fact storage — `facts` JSONB with key-value pairs, title, context, notabilityScore, status, validation
`fact_record_schemas`	Per-topic fact shape definitions — `factKeys` JSONB, `cardFormats` array

Challenge System

Table	Purpose
`fact_challenge_content`	Pre-generated challenge text per style/difficulty with `challenge_title`
`challenge_formats`	8 named formats with knowledgeType and tone
`challenge_format_styles`	Format → eligible style mapping
`challenge_format_topics`	Format → eligible topic mapping
`challenge_sessions`	Multi-turn conversational challenge state with conversation history
`card_interactions`	User engagement tracking (views, answers, bookmarks, shares)
`challenge_group_progress`	Per-position progress within challenge groups
`card_bookmarks`	User bookmarked cards
`score_disputes`	User score dispute submissions for AI re-evaluation

Ingestion & Observability

Table	Purpose
`stories`	Clustered news articles grouped by TF-IDF cosine similarity
`news_sources`	Raw articles fetched from news APIs
`ai_cost_tracking`	Per-model, per-feature daily cost aggregation
`ingestion_runs`	Pipeline observability — tracks each cron invocation
`content_operations_log`	Operational event log for pipeline debugging

Brand & Domain

Table	Purpose
`brands`	Canonical brand entities
`brand_categories`	Brand categorization taxonomy
`brand_category_assignments`	Brand → category mappings
`domains`	Domain records with verification status

User & Subscription

Table	Purpose
`profiles`	User profiles synced from Supabase Auth
`plan_definitions`	Plan configuration (free, base, pro, team, plus)
`user_subscriptions`	Maps users to plans with Stripe integration
`notification_preferences`	Per-user notification settings
`system_notifications`	System-generated notifications
`feature_flags`	Runtime feature flag toggles
`ai_model_tier_config`	AI model routing configuration per tier
`reward_milestones`	Achievement milestones for gamification
`user_reward_claims`	Claimed milestone rewards
`user_quality_grades`	Per-topic quality grade aggregation

Seed Pipeline

Table	Purpose
`seed_entry_queue`	Batch seed entry processing queue
`super_fact_links`	Cross-entry correlation links
`seed_entry_links`	Entry-to-fact provenance links

Queue System

Backend: Upstash Redis (REST API). 2-minute lease duration, max 3 attempts before dead-letter queue, exponential backoff with jitter.

Message Types

Queue	Consumer	Trigger
`INGEST_NEWS`	worker-ingest	cron/ingest-news (every 15m)
`CLUSTER_STORIES`	worker-ingest	cron/cluster-sweep (hourly)
`RESOLVE_IMAGE`	worker-ingest	post-extraction (automatic)
`RESOLVE_CHALLENGE_IMAGE`	worker-ingest	post-challenge-generation (automatic)
`EXTRACT_FACTS`	worker-facts	post-clustering (automatic)
`IMPORT_FACTS`	worker-facts	cron/import-facts (stub)
`GENERATE_EVERGREEN`	worker-facts	cron/generate-evergreen (daily)
`EXPLODE_CATEGORY_ENTRY`	worker-facts	seed pipeline (manual/batch)
`FIND_SUPER_FACTS`	worker-facts	seed pipeline (manual/batch)
`GENERATE_CHALLENGE_CONTENT`	worker-facts	post-extraction or seed pipeline
`VALIDATE_FACT`	worker-validate	post-extraction + cron/validation-retry (every 4h)
`SEND_SMS`	worker-sms	SMS delivery via Twilio

Workers

Five Bun-based queue consumer processes. Each exposes /health and implements graceful shutdown.

Worker	Queues Consumed	Purpose
`worker-ingest`	INGEST_NEWS, CLUSTER_STORIES, RESOLVE_IMAGE, RESOLVE_CHALLENGE_IMAGE	News fetch, article clustering, image resolution
`worker-facts`	EXTRACT_FACTS, IMPORT_FACTS, GENERATE_EVERGREEN, EXPLODE_CATEGORY_ENTRY, FIND_SUPER_FACTS, GENERATE_CHALLENGE_CONTENT	Fact extraction, generation, challenges
`worker-validate`	VALIDATE_FACT	4-phase verification (structural → consistency → cross-model → evidence)
`worker-reel-render`	(R2/internal triggers)	Render reel videos from fact records
`worker-sms`	SEND_SMS	SMS notifications via Twilio

AI Model Router

All AI calls go through a tier-based model router with 27 models across 6 providers.

Provider	Models
OpenAI	gpt-5.4-nano, gpt-5.4-mini, gpt-5-nano, gpt-5-mini, gpt-4o-mini, gpt-4o, gpt-4-turbo, gpt-4, gpt-3.5-turbo
Anthropic	claude-haiku-4-5, claude-sonnet-4-5, claude-opus-4-5, claude-opus-4-6
Google	gemini-2.0-flash, gemini-2.0-flash-lite, gemini-2.5-pro
MiniMax	MiniMax-M2, MiniMax-M2.1, MiniMax-M2.5
xAI	grok-4-1-fast-reasoning, grok-4-1-fast-non-reasoning, grok-4

Cost tracked via ai_cost_tracking table with daily spend caps (ANTHROPIC_DAILY_SPEND_CAP_USD).

Legacy (V1)

The original V1 URL change tracking system and its vNext global URL library have been fully superseded by the V2 fact engine. 41 V1 legacy tables were dropped in migrations 0057-0060. V1 architecture decisions (ADR-001 through ADR-012) are preserved in decisions.md for historical reference. V1 documentation is archived in docs_archive/.

Dependency Exceptions

All dependencies use @latest unless noted here:

(No exceptions currently)

#Architecture Overview

#Core Invariants

#Data Flow

#Database Schema

#Taxonomy

#Fact Storage

#Challenge System

#Ingestion & Observability

#Brand & Domain

#User & Subscription

#Seed Pipeline

#Queue System

#Message Types

#Workers

#AI Model Router

#Legacy (V1)

#Dependency Exceptions