Operational runbook for diagnosing and resolving issues with [component].

Overview

Brief description of what this runbook covers and when to use it.

Owner: [Agent or team responsible]

Escalation Path: [Who to contact if this runbook doesn't resolve the issue]

Quick Reference

Metric	Healthy	Warning	Critical
Example metric	< 100ms	100-500ms	> 500ms

Diagnostic Decision Tree

Is the service responding?
├── No → Check if container is running
│   ├── Container down → Restart container
│   └── Container up → Check logs for errors
└── Yes → Check response times
    ├── Slow → Investigate database/external dependencies
    └── Normal → Check for specific error patterns

Common Issues

Issue 1: [Problem Description]

Symptoms:

Observable symptom 1
Observable symptom 2

Cause: Root cause explanation

Resolution:

Step 1
Step 2
Step 3

Prevention: How to prevent this in the future

Issue 2: [Problem Description]

Symptoms:

Observable symptom 1

Cause: Root cause explanation

Resolution:

Step 1
Step 2

Monitoring & Alerts

Alert	Threshold	Action
Alert name	Condition	What to do

Recovery Procedures

Full Service Restart

# Commands to restart the service

Rollback Procedure

# Commands to rollback

{{TITLE}}

Overview

Quick Reference

Diagnostic Decision Tree

Common Issues

Issue 1: [Problem Description]

Issue 2: [Problem Description]

Monitoring & Alerts

Recovery Procedures

Full Service Restart

Rollback Procedure

Post-Incident Checklist

#{{TITLE}}

#Overview

#Quick Reference

#Diagnostic Decision Tree

#Common Issues

#Issue 1: [Problem Description]

#Issue 2: [Problem Description]

#Monitoring & Alerts

#Recovery Procedures

#Full Service Restart

#Rollback Procedure

#Post-Incident Checklist

#Related Documents

{{TITLE}}

Overview

Quick Reference

Diagnostic Decision Tree

Common Issues

Issue 1: [Problem Description]

Issue 2: [Problem Description]

Monitoring & Alerts

Recovery Procedures

Full Service Restart

Rollback Procedure

Post-Incident Checklist

Related Documents