Model Code Isolation

Per-model prompt optimization via the ModelAdapter strategy pattern. Each AI model gets a dedicated adapter that tailors prompts, generation parameters, and quality review guidance to exploit that model's strengths and mitigate its weaknesses.

Why Per-Model Optimization Matters

Different LLMs have distinct failure modes. GPT-5 Mini tends toward quiz-show register in challenge titles. Gemini 2.5 Flash gravitates to textbook/encyclopedia tone. Claude Haiku 4.5 can be verbose. Rather than tuning prompts to the lowest common denominator, the adapter pattern lets each model receive targeted corrections injected into the system prompt at generation time.

Directory Structure

packages/ai/src/models/
├── types.ts                     # ModelAdapter, PromptCustomization, AdaptableTask
├── registry.ts                  # getModelAdapter(), eligibility tracking (JSONL)
└── adapters/
    ├── _default.ts              # Null-object pass-through (no customization)
    ├── gpt-5-mini.ts            # OpenAI GPT-5 Mini
    ├── gemini-2.5-flash.ts      # Google Gemini 2.5 Flash
    ├── gemini-3-flash-preview.ts # Google Gemini 3 Flash Preview
    └── claude-haiku-4-5.ts      # Anthropic Claude Haiku 4.5

Interface Contract

ModelAdapter

interface ModelAdapter {
  modelId: string
  provider: string
  displayName: string
  getPromptCustomization(task: AdaptableTask): PromptCustomization
  getKnownWeaknesses(): string[]
  getSignoffGuidance(): string
}

PromptCustomization

interface PromptCustomization {
  systemPromptSuffix?: string    // Appended after base prompt (default mode)
  systemPromptPrefix?: string    // Prepended before base prompt
  systemPromptOverride?: string  // Replaces entire system prompt (use sparingly)
  temperature?: number           // Override default temperature
  maxRetries?: number            // Override default retry count
}

AdaptableTask (7 tasks)

TaskConsumer
fact_extractionfact-engine.ts — news story extraction
evergreen_generationfact-engine.ts — timeless knowledge facts
seed_explosionseed-explosion.ts — entity-to-facts expansion
challenge_content_generationchallenge-content.ts — quiz/recall content
notability_scoringfact-engine.ts — fact importance scoring
fact_validationfact-engine.ts — AI plausibility check
signoff_reviewllm-fact-quality-testing.ts, news-ingestion-test.ts — quality signoff (dedicated: gemini-3-flash-preview)

Three Customization Modes

Priority order (highest wins):

  1. OverridesystemPromptOverride replaces the entire system prompt. Use sparingly for models that need fundamentally different prompting.
  2. PrefixsystemPromptPrefix is prepended before the base prompt. Useful for setting context before instructions.
  3. Suffix (default) — systemPromptSuffix is appended after the base prompt. Most adapters use this mode to add model-specific corrections at the end.

Applied by applyPromptCustomization() in packages/ai/src/fact-engine.ts:

function applyPromptCustomization(basePrompt: string, customization: PromptCustomization): string {
  if (customization.systemPromptOverride) return customization.systemPromptOverride
  const prefix = customization.systemPromptPrefix ?? ''
  const suffix = customization.systemPromptSuffix ?? ''
  return `${prefix}${prefix ? '\n\n' : ''}${basePrompt}${suffix ? '\n\n' : ''}${suffix}`
}

Adapter Registry

getModelAdapter(modelId, provider?) in registry.ts:

  • Looks up the model ID in ADAPTER_FACTORIES (a Record<string, () => ModelAdapter>)
  • If found, calls the factory to instantiate the adapter
  • If not found, returns a default pass-through adapter (null-object pattern)
  • All adapters are cached in a Map<string, ModelAdapter> after first instantiation

Helper functions:

  • hasModelAdapter(modelId) — checks if a dedicated adapter exists
  • listAdaptedModels() — returns all model IDs with dedicated adapters

Adding a New Model

  1. Create adapter file at packages/ai/src/models/adapters/{model-id}.ts

    • Export a create{ModelName}Adapter() factory function
    • Implement getPromptCustomization() for each relevant task
    • Document known weaknesses and signoff guidance
  2. Register the factory in registry.ts:

    import { createNewModelAdapter } from './adapters/new-model'
    const ADAPTER_FACTORIES: Record<string, () => ModelAdapter> = {
      // ... existing adapters
      'new-model-id': createNewModelAdapter,
    }
    
  3. Add to model registry in packages/config/src/model-registry.ts (if not already registered)

  4. Run quality tests via bun scripts/seed/llm-fact-quality-testing.ts --models new-model-id

    • The test pipeline automatically picks up the adapter's signoff guidance
    • Review the generated report for dimension scores
  5. Verify eligibility — the model must meet per-dimension thresholds (97%+ structural, 90%+ subjective) before production use

Eligibility Tracking

Models must demonstrate quality before production seeding. Eligibility is tracked in a JSONL file at scripts/seed/.llm-test-data/eligibility.jsonl.

7 Quality Dimensions

DimensionWhat it measures
validationFacts pass multi-phase validation pipeline
evidenceFacts corroborated by external evidence
challengesChallenge content passes CQ rules (CQ-001 through CQ-013)
schema_adherenceOutput conforms to topic-specific schemas
voice_adherenceMatches Eko voice constitution and taxonomy voice
style_adherenceFollows per-style rules (setup arc, reveal structure)
token_efficiencyOutput stays within token budget (no verbose bloat)

Thresholds

Dimensions use tiered thresholds (ELIGIBILITY_THRESHOLDS in registry.ts):

TierDimensionsThresholdRationale
Structuralvalidation, evidence, challenges97%Deterministic checks — failures are unambiguous
Subjectiveschema, voice, style, token_efficiency90%LLM-scored with 5-8pt variance per run — 90% captures genuine quality issues without penalizing reviewer noise

A model that fails either tier is not production-eligible.

JSONL Format

Each entry is a single JSON line:

{"modelId":"gpt-5-mini","timestamp":"2026-02-21T10:30:00Z","dimensions":{"validation":98,"evidence":97,"challenges":99,"schema_adherence":100,"voice_adherence":98,"style_adherence":97,"token_efficiency":99},"eligible":true}

API

  • loadEligibility() — parse all entries from JSONL
  • saveEligibility(entry) — append a new entry
  • isModelEligible(modelId) — check latest entry for a model
  • getEligibleModels() — list all currently eligible models
  • computeEligibility(dimensions) — check if all dimensions meet threshold

Integration Points

Fact Extraction (fact-engine.ts)

const { resolved, model, adapter } = await selectModelForTask('fact_extraction')
// ... build system prompt ...
const customization = adapter.getPromptCustomization('fact_extraction')
systemPrompt = applyPromptCustomization(systemPrompt, customization)

The same pattern applies to scoreNotability(), generateEvergreenFacts(), and validateFact().

Seed Explosion (seed-explosion.ts)

explodeCategoryEntry() retrieves the adapter for the active model and applies seed_explosion customizations.

Challenge Content (challenge-content.ts)

generateChallengeContent() retrieves the adapter and applies challenge_content_generation customizations.

Test Pipeline (scripts/seed/llm-fact-quality-testing.ts)

The test harness calls adapter.getSignoffGuidance() and feeds it to the quality reviewer model, so the reviewer knows what model-specific weaknesses to look for.

Current Adapters

ModelProviderFocus AreasKnown Weaknesses
gpt-5-miniOpenAIActive voice, cinematic titles, second-person addressQuiz-show register in titles, passive voice, drops "you/your"
gemini-2.5-flashGoogleConversational tone, title length, narrative arcTextbook/encyclopedia register, long titles (>80 chars), flat reveals
gemini-3-flash-previewGoogleDefault model — fact generation, challenges, signoff review. Full adapter with voice, schema, factual accuracy guardrailsMay inherit Gemini-family textbook register, titles occasionally short
claude-haiku-4-5AnthropicConciseness, specificity, theatrical energyVerbose context (>8 sentences), over-explains reveals

Validation Pipeline Models

The multi-phase validation pipeline (Phases 3 and 4c) uses Gemini 2.5 Flash directly (hardcoded ResolvedModel config, not adapter-driven):

PhasePurposeModelCost
Phase 1: StructuralSchema, types, injection detectionNone (code-only)$0
Phase 2: ConsistencyContradictions, taxonomy rulesNone (code-only)$0
Phase 3: Cross-ModelAI adversarial verificationgemini-2.5-flash~$0.001
Phase 4: EvidenceAPIs + AI reasoner corroborationgemini-2.5-flash (4c)~$0.002-0.005

Key Files

FilePurpose
packages/ai/src/models/types.tsInterface definitions
packages/ai/src/models/registry.tsAdapter lookup + eligibility tracking
packages/ai/src/models/adapters/*.tsIndividual model adapters
packages/ai/src/fact-engine.tsapplyPromptCustomization(), selectModelForTask()
packages/ai/src/seed-explosion.tsSeed explosion adapter integration
packages/ai/src/challenge-content.tsChallenge content adapter integration
scripts/seed/llm-fact-quality-testing.tsQuality test pipeline with signoff guidance
scripts/seed/.llm-test-data/eligibility.jsonlEligibility tracking data