Model & Cost Optimization (WI-10)

Context

Retroactive challenge document for AI model routing and cost optimization work. This covers task-based model selection (Gemini-3-flash promoted to default), token optimization strategies, DeepSeek adapter integration, post-generation patching, and cost tracking. All work is complete — this doc exists for rollout tracking consistency.

Current State

  • All 5 challenges implemented and verified
  • Gemini-3-flash is default model for most extraction tasks
  • Token usage reduced ~63% via STYLE_RULES deduplication
  • Cost tracking operational with per-model budgets and daily caps

Challenges

Challenge 10.1: Model Routing

Requirement: Task-based tier selection for AI model usage. Acceptance Criteria:

  • Model routing config maps task types to model tiers
  • Gemini-3-flash promoted to default for standard extraction
  • Higher-tier models reserved for validation and complex reasoning
  • Fallback chain when primary model is unavailable Evaluation: PASS

Challenge 10.2: Token Optimization

Requirement: Reduce token consumption without sacrificing output quality. Acceptance Criteria:

  • STYLE_RULES deduplication achieves ~63% token reduction
  • Micro-batch amortization spreads system prompt cost across multiple facts
  • Thinking budgets tiered by task complexity (0/1024/4096 tokens)
  • Evergreen title dedup cap prevents redundant regeneration Evaluation: PASS

Challenge 10.3: DeepSeek Adapter

Requirement: Integrate DeepSeek models for specific task types. Acceptance Criteria:

  • Free-text JSON bypass for DeepSeek's non-standard JSON output
  • v4 adapter handles DeepSeek-specific response format
  • Graceful fallback to primary model on adapter failure Evaluation: PASS

Challenge 10.4: Post-Generation Patching

Requirement: Automated quality fixes applied after AI generation. Acceptance Criteria:

  • Passive voice detector and rewriter (36 patterns)
  • Generic reveal family matcher (21 families)
  • Textbook register rewriter for overly formal language
  • Patches applied before drift coordinator validation Evaluation: PASS

Challenge 10.5: Cost Tracking

Requirement: Per-model budget enforcement and spend monitoring. Acceptance Criteria:

  • Per-model budget caps (configurable per task type)
  • Daily spend caps with automatic throttling
  • Reasoning token tracking (separate from completion tokens)
  • Cost attribution by task type for reporting Evaluation: PASS

Implementation Notes

  • Model routing config in packages/ai/ — task → model tier mapping
  • Token optimization strategies are composable and can be toggled per task
  • DeepSeek adapter in packages/ai/ — handles v4 protocol differences
  • Post-generation patching runs as a pipeline stage between generation and drift validation
  • Cost tracking integrates with the admin dashboard (Challenge 6.9) for visibility