Scheduling Runbook
Purpose: Keep checks firing on time; resolve drift, missed windows, and dispatch failures.
First Diagnostic Question
Are checks not happening on time? (missed windows, drift, dispatch failures)
If yes, resolve scheduling issues before investigating tracker or queue problems.
Components
| Component | Location | Purpose |
|---|---|---|
| Cron trigger | Better Stack Heartbeat | Calls enqueue endpoint hourly |
| Cron endpoint | apps/web/app/api/enqueue-due-urls/route.ts | Queries due URLs and enqueues jobs |
| Ingest worker | apps/worker-ingest/src/index.ts | Processes news ingestion queue messages |
| Schedule metadata | tracked_urls.next_check_at | Determines when URL is due |
Better Stack Heartbeat Setup
The URL enqueue scheduler is triggered by Better Stack Heartbeats (not Vercel CRON).
Configuration
-
Log into Better Stack → Heartbeats → Create Heartbeat
-
Settings:
Field Value Name eko-enqueue-due-urlsURL https://eko.day/api/enqueue-due-urlsMethod GETSchedule Every 1 hour Grace period 5 minutes -
Headers:
Authorization: Bearer $CRON_SECRET(Use the same
CRON_SECRETvalue from Vercel environment variables) -
Expected response: HTTP 200 with JSON body
-
Alerting: Configure alerts for missed heartbeats
Benefits over Vercel CRON
- Failure alerting: Notified if endpoint returns non-2xx
- Execution history: Dashboard shows success/failure timeline
- Retry logic: Automatic retries on transient failures
- Platform-agnostic: Works if you migrate off Vercel
Invariants
- Batch dispatch: Due URLs queried in batches (currently 100 at a time)
- Idempotency: URL skipped if already checked today (UTC)
- Deterministic advancement:
next_check_atadvances based oncheck_frequencyafter job completion - Authorization: Cron endpoint requires bearer token (
CRON_SECRET)
Source of truth: STACK.md
Diagnostic Decision Tree
Scheduling issue suspected
│
├─ Checks not starting?
│ ├─ Cron not triggering → Check cron job config, verify CRON_SECRET
│ ├─ Endpoint returning 401 → Authorization header missing or wrong
│ └─ No due URLs found → Check next_check_at values in DB
│
├─ Jobs enqueued but not processed?
│ └─ See [Queue Runbook](./queue.md)
│
├─ Schedule drift accumulating?
│ ├─ next_check_at not advancing → Worker not updating after completion
│ ├─ Jobs completing but late → Processing backlog
│ └─ Frequency mismatch → check_frequency not matching expectations
│
└─ Duplicate checks happening?
├─ Idempotency guard failing → hasCheckedToday returning false incorrectly
└─ Multiple cron triggers → Cron running too frequently
Common Failure Scenarios
Cron Not Triggering
Symptoms:
- Queue depth at zero
- No recent enqueue-due-urls logs
- URLs overdue but not being checked
Actions:
- Check Better Stack Heartbeats dashboard for failures
- Verify heartbeat is configured and active
- Verify
CRON_SECRETenvironment variable matches Better Stack header - Manually trigger:
curl -X GET -H "Authorization: Bearer $CRON_SECRET" https://eko.day/api/enqueue-due-urls - Check Vercel function logs for 401/500 errors
Dispatch Failures
Symptoms:
- Cron endpoint returning errors
- URLs due but not being enqueued
- Queue connection errors in logs
Actions:
- Check Upstash Redis connection
- Verify
UPSTASH_REDIS_REST_URLandUPSTASH_REDIS_REST_TOKEN - Check for rate limiting on queue service
- Review error logs from enqueue-due-urls endpoint
Schedule Drift
Symptoms:
- URLs checked later than expected
next_check_atvalues in the past- Growing gap between expected and actual check times
Actions:
- Check if worker is updating
next_check_atafter completion - Verify
updateNextCheckTimefunction is being called - Check for processing backlogs (queue depth growing)
- Consider reducing batch size if system overloaded
Stop Conditions
Hard Stop
Trigger immediately if any are true:
- Queue depth growing faster than drain rate for extended period
- Duplicate checks causing data corruption
Action: Pause cron trigger and investigate.
Degrade Mode
- Reduce batch size (from 100 to smaller number)
- Throttle low-priority cadences (weekly before daily)
- Extend cron interval temporarily
Resume once backlog stabilizes.
Monitoring Queries
Check overdue subscriptions
SELECT id, canonical_url, check_frequency, next_check_at,
NOW() - next_check_at as overdue_by
FROM tracked_urls
WHERE next_check_at < NOW()
AND is_active = true
ORDER BY next_check_at
LIMIT 20;
Find URLs that haven't been checked recently
SELECT tu.id, tu.canonical_url, tu.check_frequency,
MAX(uc.checked_at) as last_checked
FROM tracked_urls tu
LEFT JOIN url_checks uc ON tu.id = uc.tracked_url_id
WHERE tu.is_active = true
GROUP BY tu.id
HAVING MAX(uc.checked_at) < NOW() - INTERVAL '2 days'
OR MAX(uc.checked_at) IS NULL
ORDER BY last_checked NULLS FIRST
LIMIT 20;
Check dispatch lag (time between due and enqueue)
Review logs for api:enqueue-due-urls component:
Found due URLscountEnqueue completeenqueued vs skipped
Related Runbooks
- Incident Playbook - Master triage
- Queue - If jobs aren't being processed