Meaningful Change Detection Specification
Purpose
Define what constitutes a "meaningful change" that warrants user notification and AI summarization. Eko's core principle is meaningful change gating—no summary without detected meaningful change.
Core Principle
Delta-first — Change summaries, not page content
Eko detects and reports changes to pages, not the pages themselves. A change is only meaningful if it represents actionable information for the user.
Change Types
| Type | Description | Meaningful? |
|---|---|---|
initial | First baseline capture | Always (establishes baseline) |
content_change | Text/content differs from previous hash | Yes, if above threshold |
structure_change | Section structure reorganized | Yes, if significant |
Detection Flow
Fetch page content
↓
Normalize content (strip noise)
↓
Compute content_hash (SHA-256)
↓
Compare with previous check
↓
If different → Compute section_hashes
↓
Calculate estimated_change_percentage
↓
Apply meaningful threshold
↓
If meaningful → Create page_change_event → Trigger summarization
Thresholds
| Metric | Threshold | Action |
|---|---|---|
| Hash differs | Any change | Potential change detected |
| Section change % | ≥ 5% | Classified as meaningful |
| Large change % | ≥ 15% | Consider render escalation |
What is NOT Meaningful
These changes are filtered out as cosmetic/noise:
- Timestamps/dates: "Last updated: Dec 18" changing daily
- Ad rotation: Banner ads, promotional slots
- Session content: User-specific greetings, cart counts
- Whitespace changes: Formatting without content change
- Tracking parameters: Analytics query strings
- CSRF tokens: Hidden form fields
- Cache busters: Random version strings
Content Normalization
Before hashing, content is normalized to reduce false positives:
- Strip HTML tags (keep text content)
- Normalize whitespace (collapse multiple spaces/newlines)
- Remove common noise patterns (dates, session IDs)
- Extract semantic sections (headings + content blocks)
Section-Level Detection
The section_hashes JSONB column stores minimal metadata for granular detection:
interface SectionMeta {
id: string // Unique section identifier
hash: string // SHA-256 of section content
heading?: string // Section heading text
}
This enables:
- Identifying which sections changed
- Calculating change percentage
- Providing context for summarization
Estimated Change Percentage
estimated_change_percentage =
(changed_sections / total_sections) * 100
Stored in page_change_events.diff_metadata.estimated_change_percentage.
diff_metadata Structure
interface DiffMetadata {
type: 'initial' | 'content_change' | 'structure_change'
sections_changed: string[] // IDs of changed sections
estimated_change_percentage: number // 0-100
}
First Check Handling
The first check for a URL is always marked as:
has_change: true(establishes baseline)diff_metadata.type: 'initial'- No summary generated (nothing to compare against)
User Note Integration
The pages.user_note field provides intent context:
- Helps determine if a detected change is relevant to user's stated purpose
- Guides AI summarization to focus on user's concerns
- Does not affect change detection (detection is objective)
Implementation Files
- Content normalization:
packages/shared/src/content-normalizer.ts - Hash computation:
packages/shared/src/utils.ts(sha256) - Change detection:
apps/worker-tracker/src/processor.ts