Enrichment Pipeline (WI-9)

Context

Retroactive challenge document for the enrichment pipeline work that landed post-Wave 2. This covers the enrichment orchestrator (13 API clients), challenge content generation pipeline, challenge image anti-spoiler pipeline, drift coordinators, and subcategory activations. All work is complete — this doc exists for rollout tracking consistency.

Current State

  • All 5 challenges implemented and verified
  • Enrichment orchestrator operational with 13 API clients
  • Challenge content generation producing 8 format types x 6+ styles
  • ~95 subcategories activated with domain-specific schemas

Challenges

Challenge 9.1: Enrichment Orchestrator

Requirement: Parallel API resolution with fail-open design for fact enrichment. Acceptance Criteria:

  • 13 API clients integrated (Wikipedia, Wikidata, OpenLibrary, MusicBrainz, TMDB, GeoNames, etc.)
  • Parallel resolution with independent failure handling
  • Fail-open design — enrichment failure does not block fact publication
  • Cache layer for repeated lookups Evaluation: PASS

Challenge 9.2: Challenge Content Generation

Requirement: AI-powered challenge generation across multiple format and style combinations. Acceptance Criteria:

  • 8 format types supported (multiple-choice, fill-blank, true-false, etc.)
  • 6+ style variations per format
  • Micro-batching for cost amortization
  • Voice constitution enforcement (challenge tone prefix)
  • GENERATE_CHALLENGE_CONTENT queue integration Evaluation: PASS

Challenge 9.3: Challenge Image Pipeline

Requirement: Image resolution for challenges with anti-spoiler safeguards. Acceptance Criteria:

  • RESOLVE_CHALLENGE_IMAGE queue type operational
  • Anti-spoiler logic prevents answer-revealing images
  • Fallback to generic category images when specific image is unavailable or spoiler-risk Evaluation: PASS

Challenge 9.4: Drift Coordinators

Requirement: Semantic validators ensuring AI output quality. Acceptance Criteria:

  • 7 drift coordinators implemented: voice, structure, schema, taxonomy, difficulty, reveal, textbook
  • Post-generation patching pipeline (passive voice, generic reveal, textbook register)
  • Validators run on every generated challenge before publication Evaluation: PASS

Challenge 9.5: Subcategory Activation

Requirement: Activate subcategories with domain-specific schemas for pipeline routing. Acceptance Criteria:

  • ~95 subcategories activated
  • Domain-specific schemas per subcategory
  • Auto-inheritance trigger for new subcategories
  • EXPLODE_CATEGORY_ENTRY queue type for category expansion
  • Fact routing respects subcategory schema constraints Evaluation: PASS

Implementation Notes

  • Enrichment orchestrator lives in packages/ai/
  • Challenge content generation uses Vercel AI SDK v6 generateObject with Zod schemas
  • Drift coordinators are composable — each returns pass/fail with patch suggestions
  • Subcategory schemas are stored in fact_record_schemas.fact_keys and validated at extraction time
  • Queue types added: RESOLVE_CHALLENGE_IMAGE, GENERATE_CHALLENGE_CONTENT, EXPLODE_CATEGORY_ENTRY, FIND_SUPER_FACTS