Meaningful Change Detection Specification

Purpose

Define what constitutes a "meaningful change" that warrants user notification and AI summarization. Eko's core principle is meaningful change gating—no summary without detected meaningful change.

Core Principle

Delta-first — Change summaries, not page content

Eko detects and reports changes to pages, not the pages themselves. A change is only meaningful if it represents actionable information for the user.

Change Types

TypeDescriptionMeaningful?
initialFirst baseline captureAlways (establishes baseline)
content_changeText/content differs from previous hashYes, if above threshold
structure_changeSection structure reorganizedYes, if significant

Detection Flow

Fetch page content
        ↓
Normalize content (strip noise)
        ↓
Compute content_hash (SHA-256)
        ↓
Compare with previous check
        ↓
If different → Compute section_hashes
        ↓
Calculate estimated_change_percentage
        ↓
Apply meaningful threshold
        ↓
If meaningful → Create page_change_event → Trigger summarization

Thresholds

MetricThresholdAction
Hash differsAny changePotential change detected
Section change %≥ 5%Classified as meaningful
Large change %≥ 15%Consider render escalation

What is NOT Meaningful

These changes are filtered out as cosmetic/noise:

  • Timestamps/dates: "Last updated: Dec 18" changing daily
  • Ad rotation: Banner ads, promotional slots
  • Session content: User-specific greetings, cart counts
  • Whitespace changes: Formatting without content change
  • Tracking parameters: Analytics query strings
  • CSRF tokens: Hidden form fields
  • Cache busters: Random version strings

Content Normalization

Before hashing, content is normalized to reduce false positives:

  1. Strip HTML tags (keep text content)
  2. Normalize whitespace (collapse multiple spaces/newlines)
  3. Remove common noise patterns (dates, session IDs)
  4. Extract semantic sections (headings + content blocks)

Section-Level Detection

The section_hashes JSONB column stores minimal metadata for granular detection:

interface SectionMeta {
  id: string      // Unique section identifier
  hash: string    // SHA-256 of section content
  heading?: string // Section heading text
}

This enables:

  • Identifying which sections changed
  • Calculating change percentage
  • Providing context for summarization

Estimated Change Percentage

estimated_change_percentage =
  (changed_sections / total_sections) * 100

Stored in page_change_events.diff_metadata.estimated_change_percentage.

diff_metadata Structure

interface DiffMetadata {
  type: 'initial' | 'content_change' | 'structure_change'
  sections_changed: string[]  // IDs of changed sections
  estimated_change_percentage: number  // 0-100
}

First Check Handling

The first check for a URL is always marked as:

  • has_change: true (establishes baseline)
  • diff_metadata.type: 'initial'
  • No summary generated (nothing to compare against)

User Note Integration

The pages.user_note field provides intent context:

  • Helps determine if a detected change is relevant to user's stated purpose
  • Guides AI summarization to focus on user's concerns
  • Does not affect change detection (detection is objective)

Implementation Files

  • Content normalization: packages/shared/src/content-normalizer.ts
  • Hash computation: packages/shared/src/utils.ts (sha256)
  • Change detection: apps/worker-tracker/src/processor.ts