Seed Pipeline Model Evaluation
Archived (Feb 21 2026): This evaluation led to a temporary xAI Grok integration that has since been removed. The pipeline now uses gpt-5-mini (default) with per-model prompt optimization via the ModelAdapter pattern. Available models:
gpt-5-mini,gemini-2.5-flash,gemini-3-flash-preview,claude-haiku-4-5. The content below is preserved as a historical record of the Feb 17 evaluation.
Comparison of LLM options for fact explosion and challenge title generation. The seed pipeline requires structured output (Zod schema via Vercel AI SDK) with creative, specific, theatrical titles that match Eko's voice.
Current State (Feb 17 2026)
- 144K facts generated across 10 topics from ~6K completed entries
- gpt-5-nano used for initial bulk generation (cheap but generic titles)
- gpt-5-mini used for title improvement pass and gen-2 explosions
- Title quality is still inconsistent: many titles are vague ("Musical Fusion", "Cultural Preservation") rather than specific and cinematic
- OpenAI monthly quota hit at ~$33 of $120 budget after 32.7M tokens
Requirements
| Requirement | Weight |
|---|---|
| Structured output (JSON schema) | Must-have |
| Vercel AI SDK provider support | Must-have |
| Theatrical, specific challenge titles | High |
| Low cost per million output tokens | High |
| Reasoning capability (for specificity) | Medium |
| Large context window | Low |
Model Comparison
| Model | Provider | Input $/1M | Output $/1M | Blended* | Structured Output | SDK Provider |
|---|---|---|---|---|---|---|
| gpt-5-nano | OpenAI | $0.05 | $0.40 | ~$0.30 | Yes | @ai-sdk/openai (native) |
| gpt-5-mini | OpenAI | $0.25 | $2.00 | ~$1.50 | Yes | @ai-sdk/openai (native) |
| Grok 4.1 Fast | xAI | $0.20 | $0.50 | ~$0.40 | Yes | @ai-sdk/xai (first-party) |
| Grok 4.1 Fast Reasoning | xAI | $0.20 | $0.50 | ~$0.40 | Yes | @ai-sdk/xai (first-party) |
| GLM-4.7 | Zhipu/Z.AI | $0.40 | $1.50 | ~$1.10 | Yes | zhipu-ai-provider (community) |
| Grok 4 | xAI | $3.00 | $15.00 | ~$12.00 | Yes | @ai-sdk/xai (first-party) |
| Claude Sonnet 4.5 | Anthropic | $3.00 | $15.00 | ~$12.00 | Yes | @ai-sdk/anthropic (native) |
Blended cost assumes ~3:1 output:input token ratio, typical for fact explosion.
Analysis
Grok 4.1 Fast Reasoning (Recommended)
Best value for Eko's use case.
- 4x cheaper on output than gpt-5-mini ($0.50 vs $2.00) -- output is ~75% of token spend
- Reasoning variant "thinks before generating" which should produce more specific, theatrical titles
- Scores 64/65 quality benchmark (near Grok 4 / o3 level) at 1/15th the Grok 4 price
- 2M context window (not critical for explosion but useful for super-facts)
@ai-sdk/xaiis a first-party Vercel AI SDK provider- xAI offers $25 free credits on signup + $150/month via data sharing program
Projected cost for remaining 52K entries: $5-8 (vs $20-25 with gpt-5-mini)
GLM-4.7
Solid alternative, but pricier than Grok.
- Strong at structured output and agent workflows
- 203K context window
- Output pricing ($1.50/M) is 3x Grok 4.1 Fast ($0.50/M)
- Community SDK provider (less battle-tested than first-party)
- Best suited for coding/agent tasks rather than creative fact generation
gpt-5-mini (Current)
Acceptable quality but expensive for bulk.
- Proven to work with the pipeline
- Output at $2.00/M is the most expensive option in the "cheap" tier
- Title quality improved over nano but still produces generic titles
- Shares quota with other OpenAI usage (hit $120/month cap)
Integration Path
Adding xAI Grok requires:
bun add @ai-sdk/xaiinpackages/ai- Add
XAI_API_KEYto environment config - Add
grok-4-1-fast-reasoningandgrok-4-1-fast-non-reasoningto model registry - Update
ai_model_tier_configDB table:SET model = 'grok-4-1-fast-reasoning' WHERE tier = 'default' - Test batch of 10 entries to validate structured output quality
Cost Projections
For the remaining ~52K pending entries (at ~434 tokens/fact, ~15 facts/entry):
| Model | Est. Total Cost | Quality |
|---|---|---|
| gpt-5-nano | $3-5 | Poor titles, fast |
| Grok 4.1 Fast Reasoning | $5-8 | Strong (reasoning step) |
| GLM-4.7 | $10-15 | Good |
| gpt-5-mini | $20-25 | Acceptable |
Decision Log
| Date | Decision | Rationale |
|---|---|---|
| 2026-02-17 | Start with gpt-5-nano for bulk | Minimize cost for initial corpus |
| 2026-02-17 | Switch to gpt-5-mini for quality | Nano titles too generic for platform voice |
| 2026-02-17 | Evaluate Grok 4.1 Fast | OpenAI quota hit; need cheaper + better quality |
| 2026-02-17 | Integrate xAI Grok as provider | 4x cheaper output, reasoning for specificity, first-party SDK |
| 2026-02-17 | Set ALL tiers to grok-4-1-fast-reasoning | Single provider simplifies routing, eliminates cross-provider inconsistencies |
| 2026-02-17 | Full corpus cleanup (not just titles) | Context field too sparse; notability scores missing; holistic rewrite needed |