Features Pricing Wiki About

Sign in Start free trial

Architecture Documentation
Changelog
Contracts
Developer Documentation
Marketing Documentation
MCP Servers
Plans
Policies
Press Releases
Product Documentation
Project Tracker
Proposals
Reports
Eko Rules Index
Runbooks Index
Work Summaries
App Control Manifest
Claude Project Instructions
Documentation Conventions
Eko Product Bible
Fact & Taxonomy Schema Map
Eko Glossary

Wiki
Projects
Eko V2 Rollout
Challenge: Model & Cost Optimization

Model & Cost Optimization (WI-10)

Context

Retroactive challenge document for AI model routing and cost optimization work. This covers task-based model selection (Gemini-3-flash promoted to default), token optimization strategies, DeepSeek adapter integration, post-generation patching, and cost tracking. All work is complete — this doc exists for rollout tracking consistency.

Current State

All 5 challenges implemented and verified
Gemini-3-flash is default model for most extraction tasks
Token usage reduced ~63% via STYLE_RULES deduplication
Cost tracking operational with per-model budgets and daily caps

Challenges

Challenge 10.1: Model Routing

Requirement: Task-based tier selection for AI model usage. Acceptance Criteria:

Model routing config maps task types to model tiers
Gemini-3-flash promoted to default for standard extraction
Higher-tier models reserved for validation and complex reasoning
Fallback chain when primary model is unavailable Evaluation: PASS

Challenge 10.2: Token Optimization

Requirement: Reduce token consumption without sacrificing output quality. Acceptance Criteria:

STYLE_RULES deduplication achieves ~63% token reduction
Micro-batch amortization spreads system prompt cost across multiple facts
Thinking budgets tiered by task complexity (0/1024/4096 tokens)
Evergreen title dedup cap prevents redundant regeneration Evaluation: PASS

Challenge 10.3: DeepSeek Adapter

Requirement: Integrate DeepSeek models for specific task types. Acceptance Criteria:

Free-text JSON bypass for DeepSeek's non-standard JSON output
v4 adapter handles DeepSeek-specific response format
Graceful fallback to primary model on adapter failure Evaluation: PASS

Challenge 10.4: Post-Generation Patching

Requirement: Automated quality fixes applied after AI generation. Acceptance Criteria:

Passive voice detector and rewriter (36 patterns)
Generic reveal family matcher (21 families)
Textbook register rewriter for overly formal language
Patches applied before drift coordinator validation Evaluation: PASS

Challenge 10.5: Cost Tracking

Requirement: Per-model budget enforcement and spend monitoring. Acceptance Criteria:

Per-model budget caps (configurable per task type)
Daily spend caps with automatic throttling
Reasoning token tracking (separate from completion tokens)
Cost attribution by task type for reporting Evaluation: PASS

Implementation Notes

Model routing config in packages/ai/ — task → model tier mapping
Token optimization strategies are composable and can be toggled per task
DeepSeek adapter in packages/ai/ — handles v4 protocol differences
Post-generation patching runs as a pipeline stage between generation and drift validation
Cost tracking integrates with the admin dashboard (Challenge 6.9) for visibility

On this page

Context
Current State
Challenges
Challenge 10.1: Model Routing
Challenge 10.2: Token Optimization
Challenge 10.3: DeepSeek Adapter
Challenge 10.4: Post-Generation Patching
Challenge 10.5: Cost Tracking
Implementation Notes

© 2026 Eko. All rights reserved.

Features Pricing About Sign in