Global URL Library Contract

Deprecated: This document describes the V1/vNext URL library system which has been fully superseded by the V2 fact engine. The V1/vNext tables described here have been dropped (migrations 0057-0060). See Architecture Overview for the current system and Fact & Taxonomy Schema Map for the V2 schema.

This document defines the invariants and rules governing the vNext Global URL Library system.

Overview

The vNext architecture transforms Eko from per-user URL tracking to a shared global URL library. This enables:

  • Single check per URL across all users (cost efficiency)
  • Brand-centric screens and navigation
  • Trend analysis across URLs and categories
  • Simpler subscription model based on library access

URL Uniqueness Rule

Invariant: Each URL exists exactly once in the urls table, keyed by canonical_url.

Canonical URL Definition

A canonical URL is the normalized form of a submitted URL:

  • Fragment (#...) removed
  • Hostname lowercased
  • Tracking parameters stripped (utm_*, ref, fbclid, gclid, etc.)
  • Query parameters sorted alphabetically
  • Trailing slash removed (except for root path /)

Enforcement

  • urls.canonical_url has a UNIQUE constraint
  • getOrCreateGlobalUrlByCanonical() uses ON CONFLICT DO NOTHING + select pattern
  • Duplicate submissions link to existing global URL rather than creating new entry

User Add Flow

When a user submits a URL to track:

1. User submits URL → url_submissions record created (status: pending)
2. URL normalized → canonical_url computed
3. Policy evaluation → url_policy_decisions record created
4. Decision routing:
   - BLOCK → submission.status = blocked, no global URL created
   - REVIEW → submission.status = needs_review, awaits admin
   - ALLOW → proceed to step 5
5. Global URL check:
   - EXISTS → submission.status = approved_linked_existing, matched_url_id set
   - NEW → urls record created, submission.status = approved_created_global
6. User library link → user_url_library record created
   - history_start_at set based on plan (NULL for paid, now() for free)

Tables Involved

TablePurpose
url_submissionsAudit trail of all submission attempts
url_policy_decisionsPolicy decision audit log
urlsGlobal URL entity (unique by canonical_url)
user_url_libraryUser-URL relationship

Global-Only Checks Invariant

Invariant: There is exactly one check pipeline per URL globally, not per user.

V1 Compatibility

During migration, V1 workers continue writing to url_checks, url_changes, etc. Write-through triggers mirror this data to vNext tables:

V1 TablevNext TableTrigger
url_checksurl_observationssync_check_to_observation
url_changesurl_change_eventssync_change_to_event
summariesurl_change_summariessync_summary_to_vnext
url_rendersurl_render_artifactssync_render_to_artifact

Future State

Workers will write directly to vNext tables (url_observations, etc.). The urls table's next_check_at and check_frequency fields drive scheduling.

History Gating Rule

Invariant: Free users see change history only from when they added the URL. Paid users see full history.

Implementation

  1. user_url_library.history_start_at:

    • NULL = full history access (paid plans)
    • timestamp = only see changes after this time (free plans)
  2. Database trigger (set_library_history_gating):

    • Sets history_start_at on INSERT based on user's current plan
  3. RLS policy (user_url_library_history_gated_select):

    • Filters url_change_events based on history_start_at
  4. Query layer (listUrlUpdateInstances):

    • Accepts gatingSince parameter for explicit filtering
    • Defense in depth alongside RLS

Plan Upgrade Behavior

When a user upgrades from free to paid:

  • Existing history_start_at values remain (conservative approach)
  • Admin can manually reset to NULL if business decides to unlock history retroactively

URL Policy Categories

URLs are evaluated against policy rules before being allowed into the system.

Policy Decisions

DecisionEffect
allowURL proceeds to global library
blockURL rejected, reason recorded
reviewURL queued for admin review

Reason Codes

CodeDescription
adult_contentPornography, explicit material
hate_or_extremismHate speech, extremist content
malware_or_phishingKnown malicious URLs
spam_or_low_qualitySpam, suspicious TLDs
illegal_or_regulatedGambling, drugs, weapons
auth_wall_or_uncheckableLogin required, uncheckable
duplicateAlready exists in system
unknownUnclassified rejection

Evidence Logging

All policy decisions record evidence for audit:

{
  "hostname": "example.com",
  "pattern": "pattern that matched",
  "reason": "human-readable explanation"
}

URL Cap Enforcement

Invariant: Users cannot exceed their plan's max_tracked_urls limit.

Enforcement Points

  1. Pre-check (UX): checkUrlCapEntitlement() in application code
  2. Authoritative (DB): check_library_url_cap() trigger on user_url_library

Trigger Behavior

-- Counts active entries in user_url_library
-- Compares against plan_definitions.max_tracked_urls
-- Raises exception if limit exceeded

Screen-to-Table Mapping

Dashboard Screen

Purpose: User's tracked URLs with latest updates

DataSource
User's URLsuser_url_library JOIN urls
Latest update per URLurl_change_events (with history gating)
Change summariesurl_change_summaries

Brand Screen

Purpose: Brand overview with all trackable URLs

DataSource
Brand infobrand_library_brands (via brands view)
Brand's URLsurls WHERE brand_id = ?
Latest updatesurl_change_events for brand's URLs

Brand URL Screen (Update History)

Purpose: Single URL with change history

DataSource
URL infourls
Brand infobrand_library_brands
Change historyurl_change_events (history gated)
Summariesurl_change_summaries
Gating statususer_url_library.history_start_at

Update Instance Screen

Purpose: Single change event detail

DataSource
Change eventurl_change_events
Summaryurl_change_summaries
Observationurl_observations
URL + Brandurls JOIN brand_library_brands

RLS Policy Summary

Global Tables (service_role writes, user reads via library)

  • urls: SELECT via user_url_library membership
  • url_observations: SELECT via URL membership
  • url_change_events: SELECT via URL membership + history gating
  • url_change_summaries: SELECT via event membership
  • url_render_artifacts: SELECT via URL membership

User Tables (user CRUD on own rows)

  • user_url_library: Full CRUD for user_id = auth.uid()
  • url_submissions: INSERT for self, SELECT own rows

Policy Tables

  • url_policy_decisions: SELECT via own submissions
  • Writes via service_role only
  • trend_definitions: SELECT active trends
  • trend_points: SELECT via trend membership

Billing/Onboarding

  • Own rows only, service_role for billing writes

Migration Notes

Backfill Strategy

The migration (0007_global-url-library-vnext.sql) performs idempotent backfill:

  1. tracked_urlsurls (dedupe by canonical_url, first created wins)
  2. tracked_urlsuser_url_library (with history gating based on plan)
  3. url_checksurl_observations (dedupe by url_id + checked_day)
  4. url_changesurl_change_events
  5. summariesurl_change_summaries
  6. url_rendersurl_render_artifacts

All backfill statements use ON CONFLICT DO NOTHING for safety.

Write-Through Triggers

During transition, V1 tables remain the worker write surface. AFTER INSERT triggers mirror data to vNext tables in real-time. This allows gradual migration of workers without data loss.