From Change Detection to Fact Engine: The Architecture Behind Eko v2
How we replaced a $712/month URL tracker with a $65/month knowledge platform.
FOR IMMEDIATE RELEASE February 15, 2026
The V1 Story: URL Change Detection
Eko v1 was a B2B URL change detection tool. The architecture was straightforward: users registered URLs, and three workers — tracker, render, and SMS — would periodically fetch pages via Playwright, diff the rendered output using AI summarization, and notify users of meaningful changes.
It worked, but it had fundamental scaling problems:
- N-query cost model: Every tracked URL required its own Playwright render, AI diff, and storage cycle. Costs scaled linearly with the number of URLs — $712/month at peak for ~200 active URLs
- Unstructured diffs: AI-generated text diffs were hard to search, compare, or build UI around. A "change" was just a blob of text
- No verification layer: If the AI hallucinated a diff, it went straight to the user. There was no independent check on accuracy
- Fragile rendering: Playwright browser automation broke regularly on JavaScript-heavy sites, bot detection, and CAPTCHAs
Why We Pivoted
The pivot came from three realizations:
Cost: The per-URL fetch model couldn't scale. News APIs aggregate thousands of stories for a fixed monthly fee ($45/month for NewsAPI.org). Switching from N-query fetching to fixed-cost API aggregation dropped projected infrastructure costs from $712 to ~$65/month.
Quality: Unstructured text diffs are inherently lossy. A structured fact record — with typed fields, validation status, and schema conformance — is queryable, comparable, and composable into multiple card formats. Instead of "this page changed," we could say "this specific record was broken by this person on this date, verified by these sources."
Scale: URL tracking required users to curate their own watchlist. A news aggregation pipeline discovers stories automatically, extracts facts at scale, and builds a feed that works on day one with zero user configuration.
V2 Architecture Overview
Eko v2 replaces the three legacy workers with a new three-worker pipeline:
worker-ingest: News Aggregation + Clustering
The ingest worker pulls articles from news APIs (NewsAPI.org primary, with Google News and GNews as fallbacks), normalizes them into a common schema, and clusters related articles using TF-IDF cosine similarity.
Key design decisions:
- TF-IDF over embeddings: Cosine similarity on TF-IDF vectors is fast, deterministic, and doesn't require an AI call. Embeddings are more powerful but add latency and cost for a task where keyword overlap is a strong enough signal
- Source count threshold: A story must have 3+ independent sources before it's promoted from
clusteringtopublishedstatus. This is our first deduplication and credibility filter - Content hashing: Each article gets a content hash for dedup. Same article from different providers won't create duplicate news_sources records
worker-facts: AI Extraction + Evergreen Generation
The facts worker converts published stories into structured fact records using AI extraction with schema validation.
Two modes of operation:
- News extraction: Takes a published story (3+ sources) and uses AI to extract structured facts that conform to a topic-specific schema. Each fact record includes a notability score (0-1) and typed fields validated against
fact_record_schemas.fact_keys - Evergreen generation: A daily cron job generates timeless facts (capitals, records, historical events) from authoritative APIs and structured databases, filling topic quota gaps left by the news cycle
Key design decisions:
- Schema-validated output: AI extraction uses Zod schemas derived from
fact_record_schemasto ensure type-safe, structured output. If the AI returns a field that doesn't match the schema, it's rejected at the extraction layer — not downstream - Notability scoring: Every extracted fact gets a 0-1 notability score with a human-readable reason. This drives feed ordering and prevents low-value facts from consuming quota
worker-validate: 4-Tier Verification Cascade
The validation worker is the architectural differentiator. Every fact record must pass validation before it reaches the public feed. The cascade tries each tier in order, falling through on failure:
- Authoritative API — Check the fact against a known-good API (e.g., REST Countries for geography, TheSportsDB for game results). Highest confidence, but limited coverage
- Multi-source corroboration — Cross-reference the fact against 2+ independent news sources. Confirms the claim exists in multiple reports
- Curated database — Check against internal databases of verified records, capitals, historical dates. Good for evergreen facts
- AI cross-check — As a last resort, use a separate AI model to evaluate the claim's plausibility. Lowest confidence tier, flagged accordingly
Facts that fail all four tiers are rejected. The validation result (which tier passed, confidence score, evidence) is stored in the fact_records.validation JSONB column for full auditability.
Schema Design: Fact-First Data Model
The database schema follows a strict hierarchy:
topic_categories (7 root + 13 sub-categories)
-> fact_record_schemas (defines expected JSON shape per topic type)
-> fact_records (the core data asset — structured JSON facts)
-> card_interactions (user engagement + spaced repetition)
-> card_bookmarks (saved cards)
Key design patterns:
- Materialized path: Topic categories use a
pathcolumn (e.g.,sports/basketball) for efficient hierarchical queries without recursive CTEs - Schema-driven cards: Card variations (fill-in-the-blank, multiple choice, direct question) are computed at query time from the structured
factsJSONB, not stored separately. This means adding a new card format requires zero data migration - Topic quotas: Each root category has a
daily_quotaandpercent_targetto prevent content monoculture. The daily cron enforces these quotas, ensuring the feed stays balanced even when sports news dominates the cycle
Cost Architecture: AI Model Routing
All AI calls go through a model router that selects the cheapest model capable of handling each task:
| Task | Model | % of Calls | Cost Profile |
|---|---|---|---|
| Fact extraction (standard) | GPT-4o-mini | ~95% | $0.15/1M input tokens |
| Notability scoring | GPT-4o-mini | included above | Batched with extraction |
| Complex extraction (multi-entity) | Claude Haiku | ~4% | $0.25/1M input tokens |
| Validation cross-check | Claude Opus | ~1% | $15/1M input tokens |
Budget caps are enforced per-run and per-day. If a worker exceeds its daily budget, it stops processing and logs the overage. No unbounded AI spend.
Batch processing is the other major cost lever: instead of extracting facts one-at-a-time, the worker batches 10-20 stories per extraction call, amortizing the per-call overhead.
The 7 Invariants
These architectural constraints govern all changes to the Eko codebase. They're documented in CLAUDE.md and enforced through code review:
- Fact-first — Facts are the atomic unit; everything flows from structured, schema-validated fact records
- Verification before publication — No fact reaches the public feed without at least one validation tier pass
- Source attribution — Every fact traces back to source articles and validation evidence
- Schema conformance — Fact output must validate against
fact_record_schemas.fact_keys - Cost-bounded AI — All AI calls have model routing, budget caps, and cost tracking
- Public feed / gated detail — The feed is public; full card detail and interactions require a Free/Eko+ subscription
- Topic balance — Daily quotas per topic category prevent content monoculture
When tradeoffs appear, we prefer: correctness > auditability > safety > cost control.
Stack
| Layer | Technology |
|---|---|
| Runtime | Bun |
| Language | TypeScript, Zod schemas |
| Monorepo | Turborepo |
| Frontend | Next.js 16 (App Router), Tailwind v4, shadcn/ui |
| Backend | Supabase (Postgres + RLS), Drizzle ORM |
| Cache/Queue | Upstash Redis |
| AI | Vercel AI SDK v6 (Anthropic, OpenAI providers) |
| Quality | Biome (lint/format), Vitest (tests) |
Migration Path: What Changed
The v1-to-v2 migration spans 7 implementation phases:
| Phase | Focus | Key Changes |
|---|---|---|
| 1 | Database | 5 new migrations (0092-0096): enums, topic taxonomy, fact records, stories, news sources, interactions, ingestion runs |
| 2 | Packages | New queue message types, AI extraction functions, model router, environment config, DB query layer |
| 3 | Workers | 3 new workers (ingest, facts, validate) replace 3 old workers (tracker, render, SMS). 8 new cron routes |
| 4 | Feed UI | Public card grid, category filtering, infinite scroll, cursor-based pagination |
| 5 | Detail UI | Subscription-gated card detail, quiz/recall interfaces, paywall, Stripe integration for Eko+ |
| 6 | Legacy removal | ~15,900 lines removed — old routes, workers, package code, and table drops (migrations 0097-0099) |
| 7 | Polish | Auth redirects, onboarding, marketing site rewrite, admin dashboard |
Lines removed: ~15,900 (old dashboard routes, URL tracking UI, Playwright worker, SMS worker, legacy API endpoints, domain/brand management)
Tables dropped: pages, page_observations, page_change_events, domains, brands, brand_sources, screen_avatars, personas, use_cases, and related junction tables
Tables added: topic_categories, fact_record_schemas, fact_records, stories, news_sources, card_interactions, card_bookmarks, ingestion_runs
What's Next
Open questions and future work:
- Spaced repetition refinement: The current implementation uses a basic interval system. We're evaluating SM-2 and FSRS algorithms for more sophisticated scheduling
- Image pipeline: Card images currently resolve through a Wikipedia -> TheSportsDB -> Unsplash -> AI fallback chain. Quality and attribution tracking need hardening
- Multi-language support: The fact extraction pipeline is English-only. Structured fact records are inherently more translatable than free-text diffs, but the extraction prompts need localization
- Community contributions: The schema-driven architecture means external contributors could propose new
fact_record_schemasfor domains we don't cover yet - Real-time feed: Currently batch-processed on 15-minute cron cycles. WebSocket or SSE push for breaking facts is architecturally possible but not yet prioritized
About Eko
Eko is a daily knowledge feed powered by a structured fact engine. We aggregate news, extract verified facts, and deliver them as interactive cards across sports, history, science, culture, geography, records, and current events. The architecture is built on Bun, TypeScript, Supabase, and a three-worker pipeline that prioritizes correctness and cost efficiency.
Technical Blog: eko.day/blog Website: eko.day Press Contact: press@eko.day