Error Handling Standards

Standards for consistent, observable, and recoverable error handling across the Eko codebase.


Core Rules

EH-001: Structured Error Types

All custom errors must extend a base error class with structured properties.

// Good: Structured error with code and context
class FetchError extends Error {
  constructor(
    message: string,
    public readonly code: 'FETCH_TIMEOUT' | 'FETCH_NETWORK' | 'FETCH_HTTP',
    public readonly context: { url: string; statusCode?: number }
  ) {
    super(message)
    this.name = 'FetchError'
  }
}

// Bad: Generic error without context
throw new Error('Failed to fetch URL')

Required properties:

  • code: Machine-readable error code (SCREAMING_SNAKE_CASE)
  • message: Human-readable description
  • context: Structured metadata object for debugging

EH-002: Never Swallow Errors

Catch blocks must handle errors explicitly. Never silently swallow errors.

// Good: Log and continue
try {
  await riskyOperation()
} catch (error) {
  logger.error('Operation failed', { error, context: { userId } })
  // Explicit decision to continue
}

// Good: Rethrow with context
try {
  await riskyOperation()
} catch (error) {
  throw new OperationError('Failed during processing', { cause: error })
}

// Good: Return Result type
try {
  const result = await riskyOperation()
  return { success: true, data: result }
} catch (error) {
  return { success: false, error: error.message }
}

// Bad: Silent swallow
try {
  await riskyOperation()
} catch (error) {
  // Nothing here - error is lost
}

// Bad: Log but continue without explicit decision
try {
  await riskyOperation()
} catch {
  console.log('failed')
}

Allowed patterns:

  1. Log and continue: Use logger.error() with context
  2. Rethrow with context: Wrap in a more specific error
  3. Return Result: Use { success, data/error } pattern
  4. Explicit ignore: Comment explaining why (rare)

EH-003: Worker Error Boundaries

Workers must wrap message processing in try/catch and never crash the process.

// Good: Worker error boundary
async function processMessage(message: QueueMessage) {
  const startTime = Date.now()

  try {
    await handleMessage(message)
    metrics.increment('worker.message.success')
  } catch (error) {
    logger.error('Worker message processing failed', {
      error,
      messageId: message.id,
      storyId: message.data.story_id,
      durationMs: Date.now() - startTime,
    })
    metrics.increment('worker.message.error')

    // Don't rethrow - worker continues processing other messages
  }
}

// Bad: Unhandled error crashes worker
async function processMessage(message: QueueMessage) {
  await handleMessage(message) // Throws, worker crashes
}

Worker requirements:

  • Wrap all message processing in try/catch
  • Log errors with entity context (story_id, fact_record_id) when available
  • Increment error metrics
  • Never let errors crash the worker process

EH-004: API Error Responses

HTTP errors must return consistent JSON structure.

// Good: Consistent error response
export function errorResponse(
  status: number,
  code: string,
  message: string
): NextResponse {
  return NextResponse.json(
    { error: message, code },
    { status }
  )
}

// Usage
return errorResponse(404, 'URL_NOT_FOUND', 'The requested URL does not exist')
return errorResponse(400, 'INVALID_INPUT', 'URL parameter is required')
return errorResponse(500, 'INTERNAL_ERROR', 'An unexpected error occurred')

Response format:

{
  "error": "Human-readable error message",
  "code": "MACHINE_READABLE_CODE"
}

Status code guidelines:

CodeWhen
400Invalid input, validation failure
401Authentication required
403Forbidden (auth ok, permission denied)
404Resource not found
429Rate limited
500Internal server error

Security: Never expose internal error details in production responses.


Decision Framework

Use this table to decide how to handle different error scenarios:

ScenarioPatternExample
Recoverable external errorLog, metric, continueNetwork timeout on optional fetch
Fatal config errorFail fast at startupMissing required env var
Transient failureRetry with backoffDatabase connection lost
Data integrity errorLog, skip record, alertMalformed queue message
User input errorReturn 4xx with messageInvalid URL format
Rate limit hitReturn 429, logToo many API calls
Unexpected errorLog full stack, return 500Null pointer

Retry Patterns

For transient failures, use exponential backoff:

import { retry } from '@eko/shared'

// Good: Retry with exponential backoff
const result = await retry(
  () => fetchUrl(url),
  {
    maxAttempts: 3,
    baseDelayMs: 100,
  }
)

Retry guidelines:

  • Max 3 attempts for network operations
  • Initial delay: 100-500ms
  • Max delay: 5-10 seconds
  • Only retry transient errors (timeouts, 5xx)
  • Never retry 4xx or validation errors

Logging Errors

Use structured logging with appropriate log levels:

import { logger } from '@eko/observability'

// Good: Structured error logging
logger.error('Story clustering failed', {
  error,  // Full error object
  code: error.code,
  storyId: story.id,
  category: story.category,
  attempt: currentAttempt,
  durationMs: Date.now() - startTime,
})

// Bad: Console logging
console.error('Failed:', error.message)

Log levels:

LevelWhen
errorUnexpected failures requiring investigation
warnExpected but unusual conditions
infoNormal operation events
debugDetailed debugging info (dev only)

Anti-Patterns

1. Empty Catch Blocks

// NEVER do this
try {
  await operation()
} catch {
  // Silent failure
}

2. Generic Error Messages

// Bad
throw new Error('Something went wrong')

// Good
throw new FetchError('URL returned HTTP 404', 'FETCH_HTTP', { url, statusCode: 404 })

3. Logging Sensitive Data

// Bad: Logs API key
logger.error('API call failed', { request })

// Good: Redact sensitive fields
logger.error('API call failed', {
  url: request.url,
  method: request.method,
  // headers intentionally omitted
})

4. Inconsistent Error Responses

// Bad: Different formats in different routes
return { msg: 'Error!' }
return { error: { message: 'Error' } }
return { success: false, reason: 'Error' }

// Good: Use errorResponse() helper everywhere
return errorResponse(400, 'INVALID_INPUT', 'Missing required field')