Service Level Expectations

Purpose

Define internal service level expectations for Eko operations. These are engineering targets, not customer SLAs. Use for capacity planning, monitoring alerts, and incident severity.

Page Observation Timeliness

Cadence	Target	Tolerance	Escalation
Daily	Within 24h of `next_check_at`	+6h jitter acceptable	P2 if >30h
Weekly	Within 7d of `next_check_at`	+12h jitter acceptable	P2 if >8d

Implementation Notes

next_check_at includes 0-6h random jitter to prevent thundering herd
One-check-per-day constraint: UNIQUE (page_id, checked_day)
Workers poll queue continuously; backlog indicates capacity issue

Monitoring Signals

-- Pages overdue for check
SELECT COUNT(*) FROM pages
WHERE is_active = TRUE
  AND next_check_at < NOW() - INTERVAL '6 hours';

Notification Delivery

Mode	Target	Tolerance	Escalation
Immediate	Within 5 minutes of change detection	15 min	P2 if >30 min
Daily Digest	Before 9:00 AM user timezone	1 hour	P3 if missed

Delivery States

Status	Target Duration
`queued` → `sending`	< 30 seconds
`sending` → `sent`	< 60 seconds
`sending` → `failed`	Retry 3x, then mark failed

Deduplication

Unique constraint on (page_change_event_id, channel) prevents duplicates
If constraint violated, skip silently (already delivered)

Monitoring Signals

-- Notifications stuck in queue
SELECT COUNT(*) FROM notification_delivery_log
WHERE status = 'queued'
  AND created_at < NOW() - INTERVAL '5 minutes';

Summary Quality

Metric	Target	Verification
Confidence	≥ 0.7 for published summaries	AI self-reported confidence
Fair-use compliance	Non-substitutive	Manual audit (sampling)
No hallucination	0 fabricated facts	User reports, spot checks
Meaningful change only	No summary without change	`change_detected = TRUE` required

Quality Gates

Pre-summary: Meaningful change detection must pass
Post-summary: Confidence threshold check
Audit: Sample review for fair-use compliance

Non-substitutive Criteria

Summary must not reproduce verbatim content
User must still need to visit source for full context
Focus on delta, not page content

System Availability

Component	Target Uptime	Degradation Mode
Web app	99.5%	Static error page
Tracker worker	99%	Queue backlog grows
Render worker	95%	Fallback to text-only
Queue (Upstash)	99.9%	Managed service
Database (Supabase)	99.9%	Managed service

Worker Health

Signal	Healthy	Warning	Critical
Queue depth	< 100	100-500	> 500
Check latency p95	< 30s	30-60s	> 60s
Error rate	< 1%	1-5%	> 5%

Degradation Modes

Scenario	Behavior
Render worker down	Fall back to text fetch; queue renders for retry
AI provider down	Queue summaries for retry; notify ops
Queue service down	Workers idle; no data loss (DB is source of truth)
Database read-only	Read operations continue; writes fail gracefully

Non-Commitments

These behaviors are explicitly not guaranteed:

Behavior	Reason
Real-time change detection	Cadence-based polling only
Manual check triggers	Prevents abuse; respects cadence
Check on demand for free users	Resource constraints
Guaranteed render success	External sites may block
Summary for all changes	Meaningful change filter may reject

Incident Severity

Severity	Criteria	Response Time
P1	Data loss, security breach, full outage	< 1 hour
P2	Degraded service, >10% users affected	< 4 hours
P3	Minor degradation, workaround exists	< 24 hours
P4	Cosmetic, single-user impact	Next sprint

Reference: Incident Playbook

Implementation References

Component	File
Queue configuration	`packages/queue/`
Worker implementation	`apps/worker-tracker/`, `apps/worker-render/`
Notification system	`notification_delivery_log` table
Monitoring	`packages/observability/`

#Service Level Expectations

#Purpose

#Page Observation Timeliness

#Implementation Notes

#Monitoring Signals

#Notification Delivery

#Delivery States

#Deduplication

#Monitoring Signals

#Summary Quality

#Quality Gates

#Non-substitutive Criteria

#System Availability

#Worker Health

#Degradation Modes

#Non-Commitments

#Incident Severity

#Implementation References

#Related Specifications

Service Level Expectations

Purpose

Page Observation Timeliness

Implementation Notes

Monitoring Signals

Notification Delivery

Delivery States

Deduplication

Monitoring Signals

Summary Quality

Quality Gates

Non-substitutive Criteria

System Availability

Worker Health

Degradation Modes

Non-Commitments

Incident Severity

Implementation References

Related Specifications