AI Summarization Cost Feasibility Assessment

Assessment Date: 2026-01-08 Assessed By: Claude Code Baseline: 4,000 I/O tokens per summary (3,700 input / 300 output) Models Compared: Claude Haiku 4.5, Claude Sonnet 4.5, GPT-4.1-mini


Executive Summary

ModelCost/SummaryDaily (100)Daily (1K)Monthly (3K)Monthly (30K)Feasibility
Claude Haiku 4.5$0.0052$0.52$5.20$15.60$156.00✅ Recommended
Claude Sonnet 4.5$0.0156$1.56$15.60$46.80$468.00⚠️ High-value only
GPT-4.1-mini$0.00196$0.20$1.96$5.88$58.80✅ Budget option

Recommendation: Continue with Claude Haiku 4.5 as default. Reserve Sonnet for high-complexity edge cases. GPT-4.1-mini viable as cost-saving fallback.


Model Pricing (as of January 2026)

Claude Haiku 4.5 (Current Default)

Token TypeRate (per 1M tokens)
Input$1.00
Output$5.00

Claude Sonnet 4.5 (High-Complexity Option)

Token TypeRate (per 1M tokens)
Input$3.00
Output$15.00

GPT-4.1-mini (Fallback)

Token TypeRate (per 1M tokens)
Input$0.40
Output$1.60

Per-Summary Cost Breakdown

Based on 4,000 I/O tokens (3,700 input / 300 output):

ModelInput CostOutput CostTotalvs Haiku
Claude Haiku 4.5$0.0037$0.0015$0.0052baseline
Claude Sonnet 4.5$0.0111$0.0045$0.01563.0× more
GPT-4.1-mini$0.00148$0.00048$0.001962.7× less

Daily Cost Projections

Summaries/DayHaiku 4.5Sonnet 4.5GPT-4.1-mini
10$0.05$0.16$0.02
50$0.26$0.78$0.10
100$0.52$1.56$0.20
250$1.30$3.90$0.49
500$2.60$7.80$0.98
1,000$5.20$15.60$1.96
2,500$13.00$39.00$4.90
5,000$26.00$78.00$9.80

Monthly Cost Projections (30 days)

Summaries/MonthHaiku 4.5Sonnet 4.5GPT-4.1-mini
300$1.56$4.68$0.59
1,000$5.20$15.60$1.96
3,000$15.60$46.80$5.88
5,000$26.00$78.00$9.80
10,000$52.00$156.00$19.60
30,000$156.00$468.00$58.80
50,000$260.00$780.00$98.00
100,000$520.00$1,560.00$196.00

Annual Cost Projections

Usage TierDaily VolumeHaiku 4.5/yrSonnet 4.5/yrGPT-4.1-mini/yr
Starter100/day$189.80$569.40$71.54
Growth500/day$949.00$2,847.00$357.70
Scale1,000/day$1,898.00$5,694.00$715.40
Enterprise5,000/day$9,490.00$28,470.00$3,577.00

Feasibility Analysis

Pros:

  • Optimal quality-to-cost ratio for delta-first summaries
  • Fast inference latency (~200-400ms typical)
  • Strong instruction-following for structured JSON output
  • Native Anthropic ecosystem (Vercel AI SDK integration)

Cons:

  • 3× more expensive than GPT-4.1-mini
  • May occasionally miss nuance in complex technical content

Best For: Default production use, all standard page types

Break-even: Cost-effective at any volume for Eko's use case


Claude Sonnet 4.5 ⚠️ CONDITIONAL

Pros:

  • Superior reasoning for complex multi-section diffs
  • Better handling of ambiguous or technical content
  • Higher confidence scores on edge cases

Cons:

  • 3× more expensive than Haiku
  • Marginal quality improvement for simple summaries
  • Higher latency (~500-800ms typical)

Best For:

  • High-complexity page types (legal, financial, scientific)
  • When Haiku returns low-confidence scores (<0.6)
  • User-requested "detailed analysis" mode

Recommendation: Implement as escalation tier, not default


GPT-4.1-mini ✅ VIABLE FALLBACK

Pros:

  • 2.7× cheaper than Haiku
  • Fast inference
  • Good general-purpose summarization

Cons:

  • Less reliable JSON schema adherence
  • May require more prompt engineering
  • Different output style/tone
  • Separate API key management

Best For:

  • Fallback when Anthropic is unavailable
  • Budget-constrained deployments
  • Batch processing non-critical updates

Cost Optimization Strategies

1. Batch Processing Discount (Anthropic)

  • 50% discount on async batch API
  • Suitable for non-real-time check processing
  • Could reduce Haiku costs to ~$0.0026/summary

2. Tiered Model Selection

if page_type in ['legal', 'scientific', 'financial']:
    model = 'sonnet'  # Higher complexity
elif confidence_score < 0.6:
    model = 'sonnet'  # Escalation
else:
    model = 'haiku'   # Default

3. Token Optimization

  • Current: 4,000 I/O tokens average
  • Target: 3,000 I/O tokens with content trimming
  • Potential savings: 25% cost reduction

4. Caching Strategy

  • Cache summaries for unchanged content
  • Skip AI for minor formatting changes
  • Estimated reduction: 20-40% API calls

Comparison Matrix

CriteriaHaiku 4.5Sonnet 4.5GPT-4.1-mini
Cost⭐⭐⭐⭐⭐⭐⭐⭐
Quality⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Latency⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
JSON Reliability⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Integration⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Batch Support⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐

Recommendations

Immediate Actions

  1. Keep Haiku 4.5 as default — best value for standard summaries
  2. Document Sonnet escalation criteria — when to use higher-tier model
  3. Monitor token usage — validate 4,000 token assumption with production data

Future Optimizations

  1. Implement batch API for scheduled checks (50% savings)
  2. Add content-length-based model selection
  3. Cache identical content to skip redundant API calls
  4. Consider GPT-4.1-mini for non-critical batch jobs

Budget Planning

Growth ScenarioMonthly SummariesMonthly Cost (Haiku)
MVP Launch1,000$5.20
Early Traction5,000$26.00
Growth Phase20,000$104.00
Scale100,000$520.00

Conclusion: AI summarization costs are highly feasible at all projected scales. Haiku 4.5 provides excellent quality at manageable costs, with clear escalation path to Sonnet for edge cases.


Appendix: Token Assumptions

ComponentTokensNotes
System prompt~500Fixed overhead
User prompt + context~200URL, title, tracking note
Page content (diff)~3,000Variable, compressed
Total Input~3,700
Summary output~300max_tokens=300
Total I/O~4,000

Generated by architect-steward assessment process