Scheduling Runbook

Purpose: Keep checks firing on time; resolve drift, missed windows, and dispatch failures.

First Diagnostic Question

Are checks not happening on time? (missed windows, drift, dispatch failures)

If yes, resolve scheduling issues before investigating tracker or queue problems.

Components

Component	Location	Purpose
Cron trigger	Better Stack Heartbeat	Calls enqueue endpoint hourly
Cron endpoint	`apps/web/app/api/enqueue-due-urls/route.ts`	Queries due URLs and enqueues jobs
Ingest worker	`apps/worker-ingest/src/index.ts`	Processes news ingestion queue messages
Schedule metadata	`tracked_urls.next_check_at`	Determines when URL is due

Better Stack Heartbeat Setup

The URL enqueue scheduler is triggered by Better Stack Heartbeats (not Vercel CRON).

Configuration

Log into Better Stack → Heartbeats → Create Heartbeat
Settings:

Field Value
Name eko-enqueue-due-urls
URL https://eko.day/api/enqueue-due-urls
Method GET
Schedule Every 1 hour
Grace period 5 minutes
Headers:
```
Authorization: Bearer $CRON_SECRET
```
(Use the same CRON_SECRET value from Vercel environment variables)
Expected response: HTTP 200 with JSON body
Alerting: Configure alerts for missed heartbeats

Field	Value
Name	`eko-enqueue-due-urls`
URL	`https://eko.day/api/enqueue-due-urls`
Method	`GET`
Schedule	Every 1 hour
Grace period	5 minutes

Benefits over Vercel CRON

Failure alerting: Notified if endpoint returns non-2xx
Execution history: Dashboard shows success/failure timeline
Retry logic: Automatic retries on transient failures
Platform-agnostic: Works if you migrate off Vercel

Invariants

Batch dispatch: Due URLs queried in batches (currently 100 at a time)
Idempotency: URL skipped if already checked today (UTC)
Deterministic advancement: next_check_at advances based on check_frequency after job completion
Authorization: Cron endpoint requires bearer token (CRON_SECRET)

Source of truth: STACK.md

Diagnostic Decision Tree

Scheduling issue suspected
    │
    ├─ Checks not starting?
    │   ├─ Cron not triggering → Check cron job config, verify CRON_SECRET
    │   ├─ Endpoint returning 401 → Authorization header missing or wrong
    │   └─ No due URLs found → Check next_check_at values in DB
    │
    ├─ Jobs enqueued but not processed?
    │   └─ See [Queue Runbook](./queue.md)
    │
    ├─ Schedule drift accumulating?
    │   ├─ next_check_at not advancing → Worker not updating after completion
    │   ├─ Jobs completing but late → Processing backlog
    │   └─ Frequency mismatch → check_frequency not matching expectations
    │
    └─ Duplicate checks happening?
        ├─ Idempotency guard failing → hasCheckedToday returning false incorrectly
        └─ Multiple cron triggers → Cron running too frequently

Common Failure Scenarios

Cron Not Triggering

Symptoms:

Queue depth at zero
No recent enqueue-due-urls logs
URLs overdue but not being checked

Actions:

Check Better Stack Heartbeats dashboard for failures
Verify heartbeat is configured and active
Verify CRON_SECRET environment variable matches Better Stack header
Manually trigger: curl -X GET -H "Authorization: Bearer $CRON_SECRET" https://eko.day/api/enqueue-due-urls
Check Vercel function logs for 401/500 errors

Dispatch Failures

Symptoms:

Cron endpoint returning errors
URLs due but not being enqueued
Queue connection errors in logs

Actions:

Check Upstash Redis connection
Verify UPSTASH_REDIS_REST_URL and UPSTASH_REDIS_REST_TOKEN
Check for rate limiting on queue service
Review error logs from enqueue-due-urls endpoint

Schedule Drift

Symptoms:

URLs checked later than expected
next_check_at values in the past
Growing gap between expected and actual check times

Actions:

Check if worker is updating next_check_at after completion
Verify updateNextCheckTime function is being called
Check for processing backlogs (queue depth growing)
Consider reducing batch size if system overloaded

Stop Conditions

Hard Stop

Trigger immediately if any are true:

Queue depth growing faster than drain rate for extended period
Duplicate checks causing data corruption

Action: Pause cron trigger and investigate.

Degrade Mode

Reduce batch size (from 100 to smaller number)
Throttle low-priority cadences (weekly before daily)
Extend cron interval temporarily

Resume once backlog stabilizes.

Monitoring Queries

Check overdue subscriptions

SELECT id, canonical_url, check_frequency, next_check_at,
       NOW() - next_check_at as overdue_by
FROM tracked_urls
WHERE next_check_at < NOW()
  AND is_active = true
ORDER BY next_check_at
LIMIT 20;

Find URLs that haven't been checked recently

SELECT tu.id, tu.canonical_url, tu.check_frequency,
       MAX(uc.checked_at) as last_checked
FROM tracked_urls tu
LEFT JOIN url_checks uc ON tu.id = uc.tracked_url_id
WHERE tu.is_active = true
GROUP BY tu.id
HAVING MAX(uc.checked_at) < NOW() - INTERVAL '2 days'
   OR MAX(uc.checked_at) IS NULL
ORDER BY last_checked NULLS FIRST
LIMIT 20;

Check dispatch lag (time between due and enqueue)

Review logs for api:enqueue-due-urls component:

Found due URLs count
Enqueue complete enqueued vs skipped

Incident Playbook - Master triage
Queue - If jobs aren't being processed

#Scheduling Runbook

#First Diagnostic Question

#Components

#Better Stack Heartbeat Setup

#Configuration

#Benefits over Vercel CRON

#Invariants

#Diagnostic Decision Tree

#Common Failure Scenarios

#Cron Not Triggering

#Dispatch Failures

#Schedule Drift

#Stop Conditions

#Hard Stop

#Degrade Mode

#Monitoring Queries

#Check overdue subscriptions

#Find URLs that haven't been checked recently

#Check dispatch lag (time between due and enqueue)

#Related Runbooks

Scheduling Runbook

First Diagnostic Question

Components

Better Stack Heartbeat Setup

Configuration

Benefits over Vercel CRON

Invariants

Diagnostic Decision Tree

Common Failure Scenarios

Cron Not Triggering

Dispatch Failures

Schedule Drift

Stop Conditions

Hard Stop

Degrade Mode

Monitoring Queries

Check overdue subscriptions

Find URLs that haven't been checked recently

Check dispatch lag (time between due and enqueue)

Related Runbooks