Seed Pipeline Model Evaluation

Archived (Feb 21 2026): This evaluation led to a temporary xAI Grok integration that has since been removed. The pipeline now uses gpt-5-mini (default) with per-model prompt optimization via the ModelAdapter pattern. Available models: gpt-5-mini, gemini-2.5-flash, gemini-3-flash-preview, claude-haiku-4-5. The content below is preserved as a historical record of the Feb 17 evaluation.

Comparison of LLM options for fact explosion and challenge title generation. The seed pipeline requires structured output (Zod schema via Vercel AI SDK) with creative, specific, theatrical titles that match Eko's voice.

Current State (Feb 17 2026)

144K facts generated across 10 topics from ~6K completed entries
gpt-5-nano used for initial bulk generation (cheap but generic titles)
gpt-5-mini used for title improvement pass and gen-2 explosions
Title quality is still inconsistent: many titles are vague ("Musical Fusion", "Cultural Preservation") rather than specific and cinematic
OpenAI monthly quota hit at ~$33 of $120 budget after 32.7M tokens

Requirements

Requirement	Weight
Structured output (JSON schema)	Must-have
Vercel AI SDK provider support	Must-have
Theatrical, specific challenge titles	High
Low cost per million output tokens	High
Reasoning capability (for specificity)	Medium
Large context window	Low

Model Comparison

Model	Provider	Input $/1M	Output $/1M	Blended*	Structured Output	SDK Provider
gpt-5-nano	OpenAI	$0.05	$0.40	~$0.30	Yes	`@ai-sdk/openai` (native)
gpt-5-mini	OpenAI	$0.25	$2.00	~$1.50	Yes	`@ai-sdk/openai` (native)
Grok 4.1 Fast	xAI	$0.20	$0.50	~$0.40	Yes	`@ai-sdk/xai` (first-party)
Grok 4.1 Fast Reasoning	xAI	$0.20	$0.50	~$0.40	Yes	`@ai-sdk/xai` (first-party)
GLM-4.7	Zhipu/Z.AI	$0.40	$1.50	~$1.10	Yes	`zhipu-ai-provider` (community)
Grok 4	xAI	$3.00	$15.00	~$12.00	Yes	`@ai-sdk/xai` (first-party)
Claude Sonnet 4.5	Anthropic	$3.00	$15.00	~$12.00	Yes	`@ai-sdk/anthropic` (native)

Blended cost assumes ~3:1 output:input token ratio, typical for fact explosion.

Analysis

Grok 4.1 Fast Reasoning (Recommended)

Best value for Eko's use case.

4x cheaper on output than gpt-5-mini ($0.50 vs $2.00) -- output is ~75% of token spend
Reasoning variant "thinks before generating" which should produce more specific, theatrical titles
Scores 64/65 quality benchmark (near Grok 4 / o3 level) at 1/15th the Grok 4 price
2M context window (not critical for explosion but useful for super-facts)
@ai-sdk/xai is a first-party Vercel AI SDK provider
xAI offers $25 free credits on signup + $150/month via data sharing program

Projected cost for remaining 52K entries: $5-8 (vs $20-25 with gpt-5-mini)

GLM-4.7

Solid alternative, but pricier than Grok.

Strong at structured output and agent workflows
203K context window
Output pricing ($1.50/M) is 3x Grok 4.1 Fast ($0.50/M)
Community SDK provider (less battle-tested than first-party)
Best suited for coding/agent tasks rather than creative fact generation

gpt-5-mini (Current)

Acceptable quality but expensive for bulk.

Proven to work with the pipeline
Output at $2.00/M is the most expensive option in the "cheap" tier
Title quality improved over nano but still produces generic titles
Shares quota with other OpenAI usage (hit $120/month cap)

Integration Path

Adding xAI Grok requires:

bun add @ai-sdk/xai in packages/ai
Add XAI_API_KEY to environment config
Add grok-4-1-fast-reasoning and grok-4-1-fast-non-reasoning to model registry
Update ai_model_tier_config DB table: SET model = 'grok-4-1-fast-reasoning' WHERE tier = 'default'
Test batch of 10 entries to validate structured output quality

Cost Projections

For the remaining ~52K pending entries (at ~434 tokens/fact, ~15 facts/entry):

Model	Est. Total Cost	Quality
gpt-5-nano	$3-5	Poor titles, fast
Grok 4.1 Fast Reasoning	$5-8	Strong (reasoning step)
GLM-4.7	$10-15	Good
gpt-5-mini	$20-25	Acceptable

Decision Log

Date	Decision	Rationale
2026-02-17	Start with gpt-5-nano for bulk	Minimize cost for initial corpus
2026-02-17	Switch to gpt-5-mini for quality	Nano titles too generic for platform voice
2026-02-17	Evaluate Grok 4.1 Fast	OpenAI quota hit; need cheaper + better quality
2026-02-17	Integrate xAI Grok as provider	4x cheaper output, reasoning for specificity, first-party SDK
2026-02-17	Set ALL tiers to grok-4-1-fast-reasoning	Single provider simplifies routing, eliminates cross-provider inconsistencies
2026-02-17	Full corpus cleanup (not just titles)	Context field too sparse; notability scores missing; holistic rewrite needed

#Seed Pipeline Model Evaluation

#Current State (Feb 17 2026)

#Requirements

#Model Comparison

#Analysis

#Grok 4.1 Fast Reasoning (Recommended)

#GLM-4.7

#gpt-5-mini (Current)

#Integration Path

#Cost Projections

#Decision Log

#References