Model & Cost Optimization (WI-10)
Context
Retroactive challenge document for AI model routing and cost optimization work. This covers task-based model selection (Gemini-3-flash promoted to default), token optimization strategies, DeepSeek adapter integration, post-generation patching, and cost tracking. All work is complete — this doc exists for rollout tracking consistency.
Current State
- All 5 challenges implemented and verified
- Gemini-3-flash is default model for most extraction tasks
- Token usage reduced ~63% via STYLE_RULES deduplication
- Cost tracking operational with per-model budgets and daily caps
Challenges
Challenge 10.1: Model Routing
Requirement: Task-based tier selection for AI model usage. Acceptance Criteria:
- Model routing config maps task types to model tiers
- Gemini-3-flash promoted to default for standard extraction
- Higher-tier models reserved for validation and complex reasoning
- Fallback chain when primary model is unavailable Evaluation: PASS
Challenge 10.2: Token Optimization
Requirement: Reduce token consumption without sacrificing output quality. Acceptance Criteria:
- STYLE_RULES deduplication achieves ~63% token reduction
- Micro-batch amortization spreads system prompt cost across multiple facts
- Thinking budgets tiered by task complexity (0/1024/4096 tokens)
- Evergreen title dedup cap prevents redundant regeneration Evaluation: PASS
Challenge 10.3: DeepSeek Adapter
Requirement: Integrate DeepSeek models for specific task types. Acceptance Criteria:
- Free-text JSON bypass for DeepSeek's non-standard JSON output
- v4 adapter handles DeepSeek-specific response format
- Graceful fallback to primary model on adapter failure Evaluation: PASS
Challenge 10.4: Post-Generation Patching
Requirement: Automated quality fixes applied after AI generation. Acceptance Criteria:
- Passive voice detector and rewriter (36 patterns)
- Generic reveal family matcher (21 families)
- Textbook register rewriter for overly formal language
- Patches applied before drift coordinator validation Evaluation: PASS
Challenge 10.5: Cost Tracking
Requirement: Per-model budget enforcement and spend monitoring. Acceptance Criteria:
- Per-model budget caps (configurable per task type)
- Daily spend caps with automatic throttling
- Reasoning token tracking (separate from completion tokens)
- Cost attribution by task type for reporting Evaluation: PASS
Implementation Notes
- Model routing config in
packages/ai/— task → model tier mapping - Token optimization strategies are composable and can be toggled per task
- DeepSeek adapter in
packages/ai/— handles v4 protocol differences - Post-generation patching runs as a pipeline stage between generation and drift validation
- Cost tracking integrates with the admin dashboard (Challenge 6.9) for visibility