From Change Detection to Fact Engine: The Architecture Behind Eko v2

How we replaced a $712/month URL tracker with a $65/month knowledge platform.


FOR IMMEDIATE RELEASE February 15, 2026

The V1 Story: URL Change Detection

Eko v1 was a B2B URL change detection tool. The architecture was straightforward: users registered URLs, and three workers — tracker, render, and SMS — would periodically fetch pages via Playwright, diff the rendered output using AI summarization, and notify users of meaningful changes.

It worked, but it had fundamental scaling problems:

  • N-query cost model: Every tracked URL required its own Playwright render, AI diff, and storage cycle. Costs scaled linearly with the number of URLs — $712/month at peak for ~200 active URLs
  • Unstructured diffs: AI-generated text diffs were hard to search, compare, or build UI around. A "change" was just a blob of text
  • No verification layer: If the AI hallucinated a diff, it went straight to the user. There was no independent check on accuracy
  • Fragile rendering: Playwright browser automation broke regularly on JavaScript-heavy sites, bot detection, and CAPTCHAs

Why We Pivoted

The pivot came from three realizations:

Cost: The per-URL fetch model couldn't scale. News APIs aggregate thousands of stories for a fixed monthly fee ($45/month for NewsAPI.org). Switching from N-query fetching to fixed-cost API aggregation dropped projected infrastructure costs from $712 to ~$65/month.

Quality: Unstructured text diffs are inherently lossy. A structured fact record — with typed fields, validation status, and schema conformance — is queryable, comparable, and composable into multiple card formats. Instead of "this page changed," we could say "this specific record was broken by this person on this date, verified by these sources."

Scale: URL tracking required users to curate their own watchlist. A news aggregation pipeline discovers stories automatically, extracts facts at scale, and builds a feed that works on day one with zero user configuration.

V2 Architecture Overview

Eko v2 replaces the three legacy workers with a new three-worker pipeline:

worker-ingest: News Aggregation + Clustering

The ingest worker pulls articles from news APIs (NewsAPI.org primary, with Google News and GNews as fallbacks), normalizes them into a common schema, and clusters related articles using TF-IDF cosine similarity.

Key design decisions:

  • TF-IDF over embeddings: Cosine similarity on TF-IDF vectors is fast, deterministic, and doesn't require an AI call. Embeddings are more powerful but add latency and cost for a task where keyword overlap is a strong enough signal
  • Source count threshold: A story must have 3+ independent sources before it's promoted from clustering to published status. This is our first deduplication and credibility filter
  • Content hashing: Each article gets a content hash for dedup. Same article from different providers won't create duplicate news_sources records

worker-facts: AI Extraction + Evergreen Generation

The facts worker converts published stories into structured fact records using AI extraction with schema validation.

Two modes of operation:

  1. News extraction: Takes a published story (3+ sources) and uses AI to extract structured facts that conform to a topic-specific schema. Each fact record includes a notability score (0-1) and typed fields validated against fact_record_schemas.fact_keys
  2. Evergreen generation: A daily cron job generates timeless facts (capitals, records, historical events) from authoritative APIs and structured databases, filling topic quota gaps left by the news cycle

Key design decisions:

  • Schema-validated output: AI extraction uses Zod schemas derived from fact_record_schemas to ensure type-safe, structured output. If the AI returns a field that doesn't match the schema, it's rejected at the extraction layer — not downstream
  • Notability scoring: Every extracted fact gets a 0-1 notability score with a human-readable reason. This drives feed ordering and prevents low-value facts from consuming quota

worker-validate: 4-Tier Verification Cascade

The validation worker is the architectural differentiator. Every fact record must pass validation before it reaches the public feed. The cascade tries each tier in order, falling through on failure:

  1. Authoritative API — Check the fact against a known-good API (e.g., REST Countries for geography, TheSportsDB for game results). Highest confidence, but limited coverage
  2. Multi-source corroboration — Cross-reference the fact against 2+ independent news sources. Confirms the claim exists in multiple reports
  3. Curated database — Check against internal databases of verified records, capitals, historical dates. Good for evergreen facts
  4. AI cross-check — As a last resort, use a separate AI model to evaluate the claim's plausibility. Lowest confidence tier, flagged accordingly

Facts that fail all four tiers are rejected. The validation result (which tier passed, confidence score, evidence) is stored in the fact_records.validation JSONB column for full auditability.

Schema Design: Fact-First Data Model

The database schema follows a strict hierarchy:

topic_categories (7 root + 13 sub-categories)
  -> fact_record_schemas (defines expected JSON shape per topic type)
    -> fact_records (the core data asset — structured JSON facts)
      -> card_interactions (user engagement + spaced repetition)
      -> card_bookmarks (saved cards)

Key design patterns:

  • Materialized path: Topic categories use a path column (e.g., sports/basketball) for efficient hierarchical queries without recursive CTEs
  • Schema-driven cards: Card variations (fill-in-the-blank, multiple choice, direct question) are computed at query time from the structured facts JSONB, not stored separately. This means adding a new card format requires zero data migration
  • Topic quotas: Each root category has a daily_quota and percent_target to prevent content monoculture. The daily cron enforces these quotas, ensuring the feed stays balanced even when sports news dominates the cycle

Cost Architecture: AI Model Routing

All AI calls go through a model router that selects the cheapest model capable of handling each task:

TaskModel% of CallsCost Profile
Fact extraction (standard)GPT-4o-mini~95%$0.15/1M input tokens
Notability scoringGPT-4o-miniincluded aboveBatched with extraction
Complex extraction (multi-entity)Claude Haiku~4%$0.25/1M input tokens
Validation cross-checkClaude Opus~1%$15/1M input tokens

Budget caps are enforced per-run and per-day. If a worker exceeds its daily budget, it stops processing and logs the overage. No unbounded AI spend.

Batch processing is the other major cost lever: instead of extracting facts one-at-a-time, the worker batches 10-20 stories per extraction call, amortizing the per-call overhead.

The 7 Invariants

These architectural constraints govern all changes to the Eko codebase. They're documented in CLAUDE.md and enforced through code review:

  1. Fact-first — Facts are the atomic unit; everything flows from structured, schema-validated fact records
  2. Verification before publication — No fact reaches the public feed without at least one validation tier pass
  3. Source attribution — Every fact traces back to source articles and validation evidence
  4. Schema conformance — Fact output must validate against fact_record_schemas.fact_keys
  5. Cost-bounded AI — All AI calls have model routing, budget caps, and cost tracking
  6. Public feed / gated detail — The feed is public; full card detail and interactions require a Free/Eko+ subscription
  7. Topic balance — Daily quotas per topic category prevent content monoculture

When tradeoffs appear, we prefer: correctness > auditability > safety > cost control.

Stack

LayerTechnology
RuntimeBun
LanguageTypeScript, Zod schemas
MonorepoTurborepo
FrontendNext.js 16 (App Router), Tailwind v4, shadcn/ui
BackendSupabase (Postgres + RLS), Drizzle ORM
Cache/QueueUpstash Redis
AIVercel AI SDK v6 (Anthropic, OpenAI providers)
QualityBiome (lint/format), Vitest (tests)

Migration Path: What Changed

The v1-to-v2 migration spans 7 implementation phases:

PhaseFocusKey Changes
1Database5 new migrations (0092-0096): enums, topic taxonomy, fact records, stories, news sources, interactions, ingestion runs
2PackagesNew queue message types, AI extraction functions, model router, environment config, DB query layer
3Workers3 new workers (ingest, facts, validate) replace 3 old workers (tracker, render, SMS). 8 new cron routes
4Feed UIPublic card grid, category filtering, infinite scroll, cursor-based pagination
5Detail UISubscription-gated card detail, quiz/recall interfaces, paywall, Stripe integration for Eko+
6Legacy removal~15,900 lines removed — old routes, workers, package code, and table drops (migrations 0097-0099)
7PolishAuth redirects, onboarding, marketing site rewrite, admin dashboard

Lines removed: ~15,900 (old dashboard routes, URL tracking UI, Playwright worker, SMS worker, legacy API endpoints, domain/brand management)

Tables dropped: pages, page_observations, page_change_events, domains, brands, brand_sources, screen_avatars, personas, use_cases, and related junction tables

Tables added: topic_categories, fact_record_schemas, fact_records, stories, news_sources, card_interactions, card_bookmarks, ingestion_runs

What's Next

Open questions and future work:

  • Spaced repetition refinement: The current implementation uses a basic interval system. We're evaluating SM-2 and FSRS algorithms for more sophisticated scheduling
  • Image pipeline: Card images currently resolve through a Wikipedia -> TheSportsDB -> Unsplash -> AI fallback chain. Quality and attribution tracking need hardening
  • Multi-language support: The fact extraction pipeline is English-only. Structured fact records are inherently more translatable than free-text diffs, but the extraction prompts need localization
  • Community contributions: The schema-driven architecture means external contributors could propose new fact_record_schemas for domains we don't cover yet
  • Real-time feed: Currently batch-processed on 15-minute cron cycles. WebSocket or SSE push for breaking facts is architecturally possible but not yet prioritized

About Eko

Eko is a daily knowledge feed powered by a structured fact engine. We aggregate news, extract verified facts, and deliver them as interactive cards across sports, history, science, culture, geography, records, and current events. The architecture is built on Bun, TypeScript, Supabase, and a three-worker pipeline that prioritizes correctness and cost efficiency.

Technical Blog: eko.day/blog Website: eko.day Press Contact: press@eko.day