Summarization Runbook
Purpose: Produce fair-use, non-substitutive summaries without drift.
First Diagnostic Question
Is the output text wrong? (hallucination, tone drift, over-quoting)
If yes, halt summarization before adjusting prompts.
Guardrails
- Summarize delta only (not full page content)
- No long quotes; paraphrase
- Match tone to URL type
- Respect fair-use constraints at all times
Diagnostic Decision Tree
Summarization issue suspected
│
├─ Hallucination? (summary contains invented information)
│ ├─ Content not in delta → Model confabulation, tighten prompt
│ ├─ Misinterpretation → Delta context insufficient
│ └─ Pattern-matching error → Model saw similar content elsewhere
│
├─ Over-quoting? (too much verbatim content)
│ ├─ Exceeds length caps → Enforce truncation
│ ├─ Feels like replacement → Fair-use violation, suppress
│ └─ Quote boundaries unclear → Improve paraphrase instructions
│
├─ Tone issues? (urgency, style mismatch)
│ ├─ Overstated urgency → Calibrate confidence language
│ ├─ Wrong register → URL type detection issue
│ └─ Inconsistent across runs → Model temperature too high
│
├─ Missing key changes?
│ ├─ Delta correct but summary incomplete → Prompt issue
│ ├─ Changes below significance threshold → Intentional suppression
│ └─ Model truncated output → Token limit hit
│
└─ Confidence wrong?
├─ High confidence on ambiguous change → Tighten confidence criteria
├─ Low confidence on clear change → Confidence logic bug
└─ Confidence not matching delta quality → Miscalibration
Common Failure Scenarios
Hallucination
Symptoms:
- Summary mentions changes not in delta
- Invented statistics or dates
- Confusion with similar pages
Actions:
- Immediately suppress affected summaries
- Review delta to confirm content mismatch
- Tighten prompt to restrict to delta content only
- Add explicit "only summarize provided delta" instruction
- Reduce model temperature if using non-zero
Priority: Always treat as SEV-1 if widespread.
Fair-Use Violations (Over-Quoting)
Symptoms:
- Summaries contain excessive verbatim quotation
- Output feels like page replacement
- Length exceeds expected bounds
Actions:
- Immediately halt affected summaries
- Review summarization prompts for quoting instructions
- Reduce excerpt limits
- Force paraphrase-only mode
- Add post-processing to detect and truncate
Priority: Always treat as SEV-1.
Tone Drift
Symptoms:
- Overstated urgency ("CRITICAL!", "BREAKING!")
- Inconsistent formality
- Emotional language inappropriate for content
Actions:
- Review URL type detection - is it being classified correctly?
- Adjust tone guidelines in prompt
- Add examples of appropriate vs inappropriate tone
- Consider lowering temperature for more consistent output
Repeated Phrasing
Symptoms:
- Same phrases appearing across different summaries
- Formulaic structure becoming stale
- Model "habits" emerging
Actions:
- Review prompt for unintentional anchoring
- Vary prompt structure slightly
- Add diversity instructions
- Monitor for improvement
Missing Key Information
Symptoms:
- Important changes not mentioned in summary
- Summary too brief given delta size
- User reports summary missed something
Actions:
- Verify delta contains the expected changes
- Check if model output was truncated
- Review prompt for explicit inclusion requirements
- Consider multi-pass summarization for complex deltas
Stop Conditions
Hard Stop
Trigger immediately if any are true:
- Summaries violate fair-use constraints at scale
- Output risks replacing the source page
- Hallucinations affecting multiple URLs
Action: Suppress summaries and halt summarization jobs.
Degrade Mode
- Emit "no meaningful change" or "delta unavailable" messages
- Force paraphrase-only mode
- Reduce summary length limits
Resume after manual spot-checks confirm safety.
Signals to Watch
| Signal | Indicates |
|---|---|
| Repeated phrasing across days | Model anchoring, needs prompt refresh |
| Overstated urgency | Tone calibration issue |
| Quote ratio increasing | Drift toward fair-use violation |
| User complaints about accuracy | Hallucination or omission |
| Summary length variance | Inconsistent generation |
Quality Checks
Sample Summary Review
For any suspected issue, manually review:
- The delta provided to the model
- The prompt used
- The generated summary
- Compare to what a human would write
Automated Checks
Consider implementing:
- Quote ratio monitoring (verbatim % of output)
- Length consistency checks
- Confidence score distribution
- A/B testing of prompt changes
Database Queries
Find summaries with high quote ratios
SELECT id, url_id, summary_text, quote_ratio, created_at
FROM summaries
WHERE quote_ratio > 0.3
AND created_at > NOW() - INTERVAL '24 hours'
ORDER BY quote_ratio DESC
LIMIT 20;
Check summary generation patterns
SELECT DATE(created_at) as day,
COUNT(*) as total,
AVG(LENGTH(summary_text)) as avg_length,
AVG(confidence_score) as avg_confidence
FROM summaries
WHERE created_at > NOW() - INTERVAL '7 days'
GROUP BY DATE(created_at)
ORDER BY day;
Find URLs with summary issues
SELECT url_id, COUNT(*) as summary_count,
AVG(confidence_score) as avg_confidence,
COUNT(*) FILTER (WHERE confidence_score < 0.5) as low_confidence_count
FROM summaries
WHERE created_at > NOW() - INTERVAL '7 days'
GROUP BY url_id
HAVING COUNT(*) FILTER (WHERE confidence_score < 0.5) > 2
ORDER BY low_confidence_count DESC;
Related Runbooks
- Incident Playbook - Master triage
- Render - If delta is wrong (upstream issue)
- Fair Use Policy - Compliance requirements