2. Light Adapter Smoke Test

Purpose: Verify all 13 registered adapters are discoverable and produce valid prompt customizations for every AdaptableTask (7 tasks).

Prerequisites: None (in-memory only, no API keys or DB needed)

Cost / Duration: $0 | <5 seconds

Prompt

Run an inline adapter contract smoke test. Create and execute this script:

```ts
import { getModelAdapter, listAdaptedModels } from './packages/ai/src/models/registry'
import type { AdaptableTask } from './packages/ai/src/models/types'

const EXPECTED_MODELS = [
  // OpenAI
  'gpt-5.4-mini', 'gpt-5.4-nano', 'gpt-5-mini', 'gpt-4o-mini',
  // Anthropic
  'claude-haiku-4-5',
  // xAI
  'grok-4-1-fast-non-reasoning',
  // Google
  'gemini-2.0-flash-lite', 'gemini-2.5-flash', 'gemini-3-flash-preview',
  // DeepSeek
  'deepseek-chat',
  // Mistral
  'mistral-large-latest', 'mistral-medium-latest', 'mistral-small-latest',
]
const ALL_TASKS: AdaptableTask[] = [
  'fact_extraction',
  'evergreen_generation',
  'seed_explosion',
  'challenge_content_generation',
  'notability_scoring',
  'fact_validation',
  'signoff_review',
]

// 1. Verify all expected models are registered
const registered = listAdaptedModels()
for (const m of EXPECTED_MODELS) {
  if (!registered.includes(m)) throw new Error(`Missing adapter: ${m}`)
}
console.log(`✓ ${registered.length} adapters registered (expected ${EXPECTED_MODELS.length})`)

// 2. Verify all model×task combinations produce valid output
let combos = 0
for (const modelId of EXPECTED_MODELS) {
  const adapter = getModelAdapter(modelId)
  for (const task of ALL_TASKS) {
    const custom = adapter.getPromptCustomization(task)
    // Must have at least one non-empty prompt field
    const hasPrompt = custom.systemPromptSuffix || custom.systemPromptPrefix || custom.systemPromptOverride
    if (!hasPrompt) throw new Error(`Empty prompt for ${modelId}/${task}`)
    // Temperature must be 0-2 if set
    if (custom.temperature !== undefined && (custom.temperature < 0 || custom.temperature > 2)) {
      throw new Error(`Invalid temperature ${custom.temperature} for ${modelId}/${task}`)
    }
    combos++
  }
  // Adapter must declare known weaknesses
  const weaknesses = adapter.getKnownWeaknesses()
  if (!weaknesses || weaknesses.length === 0) {
    throw new Error(`No knownWeaknesses for ${modelId}`)
  }
  // Adapter must have signoff guidance
  const guidance = adapter.getSignoffGuidance()
  if (!guidance) {
    throw new Error(`No signoffGuidance for ${modelId}`)
  }
  // Optional: check topic exclusions are valid if declared
  const excluded = adapter.getExcludedTopics?.()
  if (excluded && !Array.isArray(excluded)) {
    throw new Error(`getExcludedTopics must return string[] for ${modelId}`)
  }
}
console.log(`✓ ${combos} task combinations validated`)
console.log('All adapter contract checks passed.')
```

Run with: `bun run <script-file>.ts`

Verification

  • 13 adapters discovered (see EXPECTED_MODELS list above)
  • 91 task combinations validated (13 models x 7 tasks)
  • Every combination returns a non-empty prompt customization
  • Temperature values are in range 0-2 (when set)
  • Every adapter has getKnownWeaknesses() returning non-empty array
  • Every adapter has getSignoffGuidance() returning non-empty string
  • getExcludedTopics() returns valid string array when declared (optional method)

Notes

  • The DeepSeek adapter (deepseek-chat) uses a V4 free-text generation mode — it does not use json_schema structured output. A custom fetch wrapper in model-router.ts handles the json_schema → json_object downgrade and injects schema guidance into the system prompt.
  • Mistral adapters (mistral-large-latest, mistral-medium-latest, mistral-small-latest) support native json_schema mode.
  • The _default pass-through adapter is not in this list — it serves as a fallback for any unregistered model and has no task-specific customizations.
  • Topic exclusion: Adapters may implement the optional getExcludedTopics() method to declare topic prefixes the model should not handle. Currently gpt-5.4-nano excludes sports (evidence fabrication) and music (validation failures at 50% in T2 testing). The model router falls back to gemini-3-flash-preview for excluded topics.

Back to index