AI Summarization Cost Feasibility Assessment
Assessment Date: 2026-01-08 Assessed By: Claude Code Baseline: 4,000 I/O tokens per summary (3,700 input / 300 output) Models Compared: Claude Haiku 4.5, Claude Sonnet 4.5, GPT-4.1-mini
Executive Summary
| Model | Cost/Summary | Daily (100) | Daily (1K) | Monthly (3K) | Monthly (30K) | Feasibility |
|---|---|---|---|---|---|---|
| Claude Haiku 4.5 | $0.0052 | $0.52 | $5.20 | $15.60 | $156.00 | ✅ Recommended |
| Claude Sonnet 4.5 | $0.0156 | $1.56 | $15.60 | $46.80 | $468.00 | ⚠️ High-value only |
| GPT-4.1-mini | $0.00196 | $0.20 | $1.96 | $5.88 | $58.80 | ✅ Budget option |
Recommendation: Continue with Claude Haiku 4.5 as default. Reserve Sonnet for high-complexity edge cases. GPT-4.1-mini viable as cost-saving fallback.
Model Pricing (as of January 2026)
Claude Haiku 4.5 (Current Default)
| Token Type | Rate (per 1M tokens) |
|---|---|
| Input | $1.00 |
| Output | $5.00 |
Claude Sonnet 4.5 (High-Complexity Option)
| Token Type | Rate (per 1M tokens) |
|---|---|
| Input | $3.00 |
| Output | $15.00 |
GPT-4.1-mini (Fallback)
| Token Type | Rate (per 1M tokens) |
|---|---|
| Input | $0.40 |
| Output | $1.60 |
Per-Summary Cost Breakdown
Based on 4,000 I/O tokens (3,700 input / 300 output):
| Model | Input Cost | Output Cost | Total | vs Haiku |
|---|---|---|---|---|
| Claude Haiku 4.5 | $0.0037 | $0.0015 | $0.0052 | baseline |
| Claude Sonnet 4.5 | $0.0111 | $0.0045 | $0.0156 | 3.0× more |
| GPT-4.1-mini | $0.00148 | $0.00048 | $0.00196 | 2.7× less |
Daily Cost Projections
| Summaries/Day | Haiku 4.5 | Sonnet 4.5 | GPT-4.1-mini |
|---|---|---|---|
| 10 | $0.05 | $0.16 | $0.02 |
| 50 | $0.26 | $0.78 | $0.10 |
| 100 | $0.52 | $1.56 | $0.20 |
| 250 | $1.30 | $3.90 | $0.49 |
| 500 | $2.60 | $7.80 | $0.98 |
| 1,000 | $5.20 | $15.60 | $1.96 |
| 2,500 | $13.00 | $39.00 | $4.90 |
| 5,000 | $26.00 | $78.00 | $9.80 |
Monthly Cost Projections (30 days)
| Summaries/Month | Haiku 4.5 | Sonnet 4.5 | GPT-4.1-mini |
|---|---|---|---|
| 300 | $1.56 | $4.68 | $0.59 |
| 1,000 | $5.20 | $15.60 | $1.96 |
| 3,000 | $15.60 | $46.80 | $5.88 |
| 5,000 | $26.00 | $78.00 | $9.80 |
| 10,000 | $52.00 | $156.00 | $19.60 |
| 30,000 | $156.00 | $468.00 | $58.80 |
| 50,000 | $260.00 | $780.00 | $98.00 |
| 100,000 | $520.00 | $1,560.00 | $196.00 |
Annual Cost Projections
| Usage Tier | Daily Volume | Haiku 4.5/yr | Sonnet 4.5/yr | GPT-4.1-mini/yr |
|---|---|---|---|---|
| Starter | 100/day | $189.80 | $569.40 | $71.54 |
| Growth | 500/day | $949.00 | $2,847.00 | $357.70 |
| Scale | 1,000/day | $1,898.00 | $5,694.00 | $715.40 |
| Enterprise | 5,000/day | $9,490.00 | $28,470.00 | $3,577.00 |
Feasibility Analysis
Claude Haiku 4.5 ✅ RECOMMENDED
Pros:
- Optimal quality-to-cost ratio for delta-first summaries
- Fast inference latency (~200-400ms typical)
- Strong instruction-following for structured JSON output
- Native Anthropic ecosystem (Vercel AI SDK integration)
Cons:
- 3× more expensive than GPT-4.1-mini
- May occasionally miss nuance in complex technical content
Best For: Default production use, all standard page types
Break-even: Cost-effective at any volume for Eko's use case
Claude Sonnet 4.5 ⚠️ CONDITIONAL
Pros:
- Superior reasoning for complex multi-section diffs
- Better handling of ambiguous or technical content
- Higher confidence scores on edge cases
Cons:
- 3× more expensive than Haiku
- Marginal quality improvement for simple summaries
- Higher latency (~500-800ms typical)
Best For:
- High-complexity page types (legal, financial, scientific)
- When Haiku returns low-confidence scores (<0.6)
- User-requested "detailed analysis" mode
Recommendation: Implement as escalation tier, not default
GPT-4.1-mini ✅ VIABLE FALLBACK
Pros:
- 2.7× cheaper than Haiku
- Fast inference
- Good general-purpose summarization
Cons:
- Less reliable JSON schema adherence
- May require more prompt engineering
- Different output style/tone
- Separate API key management
Best For:
- Fallback when Anthropic is unavailable
- Budget-constrained deployments
- Batch processing non-critical updates
Cost Optimization Strategies
1. Batch Processing Discount (Anthropic)
- 50% discount on async batch API
- Suitable for non-real-time check processing
- Could reduce Haiku costs to ~$0.0026/summary
2. Tiered Model Selection
if page_type in ['legal', 'scientific', 'financial']:
model = 'sonnet' # Higher complexity
elif confidence_score < 0.6:
model = 'sonnet' # Escalation
else:
model = 'haiku' # Default
3. Token Optimization
- Current: 4,000 I/O tokens average
- Target: 3,000 I/O tokens with content trimming
- Potential savings: 25% cost reduction
4. Caching Strategy
- Cache summaries for unchanged content
- Skip AI for minor formatting changes
- Estimated reduction: 20-40% API calls
Comparison Matrix
| Criteria | Haiku 4.5 | Sonnet 4.5 | GPT-4.1-mini |
|---|---|---|---|
| Cost | ⭐⭐⭐ | ⭐ | ⭐⭐⭐⭐⭐ |
| Quality | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
| Latency | ⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ |
| JSON Reliability | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
| Integration | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| Batch Support | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
Recommendations
Immediate Actions
- Keep Haiku 4.5 as default — best value for standard summaries
- Document Sonnet escalation criteria — when to use higher-tier model
- Monitor token usage — validate 4,000 token assumption with production data
Future Optimizations
- Implement batch API for scheduled checks (50% savings)
- Add content-length-based model selection
- Cache identical content to skip redundant API calls
- Consider GPT-4.1-mini for non-critical batch jobs
Budget Planning
| Growth Scenario | Monthly Summaries | Monthly Cost (Haiku) |
|---|---|---|
| MVP Launch | 1,000 | $5.20 |
| Early Traction | 5,000 | $26.00 |
| Growth Phase | 20,000 | $104.00 |
| Scale | 100,000 | $520.00 |
Conclusion: AI summarization costs are highly feasible at all projected scales. Haiku 4.5 provides excellent quality at manageable costs, with clear escalation path to Sonnet for edge cases.
Appendix: Token Assumptions
| Component | Tokens | Notes |
|---|---|---|
| System prompt | ~500 | Fixed overhead |
| User prompt + context | ~200 | URL, title, tracking note |
| Page content (diff) | ~3,000 | Variable, compressed |
| Total Input | ~3,700 | |
| Summary output | ~300 | max_tokens=300 |
| Total I/O | ~4,000 |
Generated by architect-steward assessment process