Challenge Image Architecture

Reference for auto-generating challenge-aligned images in Eko's fact challenge system.


Design Decision: Option C (Hybrid Inheritance + Override)

Most challenges inherit the parent fact's image. Only styles where the fact image would spoil the answer get a dedicated image resolution job.

fact_records.imageUrl ──────────────────────────────────────────┐
                                                                │
GENERATE_CHALLENGE_CONTENT                                      │
  │                                                             │
  ├─ style needs override? ── NO ── image_status='inherited' ──┤
  │                                                             │
  └─ YES ── enqueue RESOLVE_CHALLENGE_IMAGE                     │
                │                                               │
                ▼                                               │
         challenge-aware cascade                                │
                │                                               │
                ├─ found ── image_status='resolved'             │
                │                                               │
                └─ not found ── image_status='skipped' ─────────┤
                                                                │
                                              Display: ─────────┘
                                              challenge.image_url
                                                ?? fact.imageUrl
                                                ?? topicPlaceholder

Style → Image Strategy Matrix

Challenge StyleInherits Fact Image?Override ReasonOverride Frequency
multiple_choiceYesnever
direct_questionYesnever
free_textYesnever
fill_the_gapConditionalGap may be the entity name~30% of instances
statement_blankConditionalBlank may be the entity name~30% of instances
reverse_lookupAlways overrideEntity IS the answer100%
progressive_image_revealAlways overrideNeeds purpose-built visual100%

Override Decision Logic

needsOverride(style, styleData, factTitle):
  if style == 'reverse_lookup':           return true
  if style == 'progressive_image_reveal': return true
  if style == 'fill_the_gap':
    return styleData.complete_text contains factTitle
  if style == 'statement_blank':
    return styleData.statement contains factTitle
  return false

Estimated override volume: ~33% of all generated challenges (2 of 6 styles always, 2 conditionally).


Image Source Cascade

Layer 0 — Inherited (no API call)

The fact's existing image, resolved by the RESOLVE_IMAGE handler at extraction time.

  • Source: fact_records.imageUrl (Wikipedia, TheSportsDB, Unsplash, Pexels)
  • Coverage: ~70-80% of all challenges
  • Cost: $0

Layer 1 — Topic-Specific APIs (structured, free, image-included)

APIs that return images as part of their data response. Matched by topic slug.

Topic SlugSourceAPIImagesFree TierNotes
science, spaceNASAapi.nasa.govAPOD, Mars Rover, Earth1K req/hrSpectacular photography
artMetropolitan Museummetmuseum.github.io470K+ artworksUnlimitedPublic domain
artArt Institute Chicagoapi.artic.eduHi-res artworkUnlimitedCC0 licensed
entertainment, tvTMDbapi.themoviedb.orgPosters, backdrops, castNon-commercial9M+ titles
animalsiNaturalistapi.inaturalist.orgSpecies photosUnlimitedCitizen science
cookingTheMealDBthemealdb.com/apiRecipe photosUnlimitedNo key (test key: 1)
food-beverageTheCocktailDBthecocktaildb.com/apiCocktail photosUnlimitedNo key (test key: 1)
gamesRAWGrawg.io/apiScreenshots, covers20K req/mo500K+ games
publishingOpen Librarycovers.openlibrary.orgBook coversUnlimitedBy ISBN/OLID
sportsTheSportsDBthesportsdb.comBadges, logos, bannersUnlimitedAlready in fact cascade

Search strategy: Use fact metadata (topic slug, schema keys, entity identifiers) to query the relevant API. Return the first image that is NOT the same as the fact's entity image.


Layer 2 — Wikimedia Commons (broad coverage, free)

GET https://commons.wikimedia.org/w/api.php
  ?action=query
  &generator=search
  &gsrsearch={context_keywords}
  &prop=imageinfo
  &iiprop=url|extmetadata
  &iiurlwidth=800
  &format=json
  • Coverage: 100M+ freely licensed images
  • Key advantage: Search by topic context, not entity name
  • Search query: Derived from setup_text keywords minus answer-revealing terms
  • Attribution: License metadata in extmetadata response field
  • No API key required
  • Rate limit: Be polite; use User-Agent: Eko/1.0 header

Same providers as the fact image cascade, but with a different search query.

ProviderKey RequiredFree TierRate Limit
UnsplashUNSPLASH_ACCESS_KEY50 req/hr (demo) / 5K req/hr (prod)Per-hour
PexelsPEXELS_API_KEY200 req/hr, 20K req/moPer-hour + monthly

Query construction:

buildChallengeImageQuery(setupText, topicSlug, correctAnswer):
  1. Extract nouns/adjectives from setupText
  2. Remove terms that match correctAnswer (anti-spoiler)
  3. Prepend topicSlug as context anchor
  4. Return: "{topicSlug} {filtered_keywords}"

Example:
  setupText:     "This Parisian iron lattice tower was completed in 1889"
  topicSlug:     "architecture"
  correctAnswer: "Eiffel Tower"
  query:         "architecture Parisian iron lattice 1889"

Layer 4 — AI-Generated (future)

Not currently configured. Architecture slot reserved.

ProviderEnv VarCost/ImageBest For
Cloudflare Workers AI (SDXL)CLOUDFLARE_ACCOUNT_ID (exists)~$0 (included)General illustration
Flux (via fal.ai)FAL_API_KEY~$0.003High-quality scenes
DALL-E 3OPENAI_API_KEY (exists)~$0.04Premium visual puzzles

Primary use case: progressive_image_reveal — generating visual puzzles that can be progressively unmasked.

Cloudflare note: Eko already has CLOUDFLARE_ACCOUNT_ID for Stream. Workers AI includes Stable Diffusion XL on paid plans. Images could store directly to the existing R2 bucket via packages/r2.


Schema Changes

fact_challenge_content — New Columns

ALTER TABLE fact_challenge_content
  ADD COLUMN image_url          TEXT,
  ADD COLUMN image_source       TEXT,
  ADD COLUMN image_attribution  TEXT,
  ADD COLUMN image_status       TEXT DEFAULT 'inherited'
    CHECK (image_status IN ('inherited', 'pending', 'resolved', 'skipped'));
ColumnTypeDefaultPurpose
image_urlTEXTNULLOverride image URL; NULL = use fact's image
image_sourceTEXTNULLProvider name (wikimedia, unsplash, nasa, etc.)
image_attributionTEXTNULLCredit line for display
image_statusTEXT'inherited'Lifecycle state of image resolution

image_status Lifecycle

'inherited'  →  Challenge uses fact's image (no override needed)
'pending'    →  Override requested, RESOLVE_CHALLENGE_IMAGE enqueued
'resolved'   →  Override image found and stored
'skipped'    →  Override attempted, no suitable image found; falls back to fact image

Queue Message Type

RESOLVE_CHALLENGE_IMAGE

interface ResolveChallengeImageMessage {
  type: 'RESOLVE_CHALLENGE_IMAGE'
  payload: {
    challenge_content_id: string   // UUID → fact_challenge_content.id
    fact_record_id: string         // UUID → parent fact for context
    challenge_style: ChallengeStyle
    setup_text: string             // Challenge framing (for query construction)
    correct_answer: string         // To EXCLUDE from image search (anti-spoiler)
    topic_slug: string             // For topic-specific API routing
    topic_path: string             // Full path for subcategory matching
    fact_title: string             // Fact title (to avoid duplicate of fact image)
  }
}

Consumer: worker-ingest (co-located with existing RESOLVE_IMAGE handler)

Max attempts: 3 (standard queue retry with exponential backoff)


Pipeline Integration

Where RESOLVE_CHALLENGE_IMAGE Gets Enqueued

In generate-challenge-content.ts, after successful upsert:

for each generated challenge:
  if needsOverride(style, styleData, factTitle):
    set image_status = 'pending'
    enqueue RESOLVE_CHALLENGE_IMAGE
  else:
    set image_status = 'inherited'

Handler Cascade (in worker-ingest)

processResolveChallengeImage(message):
  1. Load challenge content + parent fact
  2. If challenge already has image_url → skip (idempotent)
  3. Build context-aware search query (anti-spoiler)
  4. Run cascade:
     a. Topic API (if topic_slug has mapped source)
     b. Wikimedia Commons (context keyword search)
     c. Unsplash (filtered query)
     d. Pexels (filtered query)
  5. If result AND result.url != fact.imageUrl:
       → update challenge: image_url, image_source, image_attribution, image_status='resolved'
     Else:
       → update challenge: image_status='skipped'

Display Logic (UI)

function getChallengeImageUrl(
  challenge: FactChallengeContent,
  fact: FactRecord,
  topic: TopicCategory
): string {
  return challenge.imageUrl       // Challenge-specific override
    ?? fact.imageUrl              // Parent fact image
    ?? getTopicPlaceholder(topic) // Branded topic fallback
}

Cost & Rate Limit Estimates

Per 1,000 Facts Processed

StepVolumeAPI CallsCost
Challenges generated~6,000 (6 styles × 1K facts)0 (already happening)~$0.30 AI
Inherited (no override)~4,000 (67%)0$0
Overrides needed~2,000 (33%)2,000 max$0
Layer 1 hits (topic API)~800 (40% of overrides)800$0 (free APIs)
Layer 2 hits (Wikimedia)~700 (35% of remaining)700$0
Layer 3 hits (Unsplash/Pexels)~400 (remaining)400$0
Layer 4 hits (AI-gen, future)0 (not active)0$0
No image found~100 (5%)

Total additional API calls per 1K facts: ~1,900 Total additional cost: $0 (all free-tier APIs) Unsplash/Pexels budget per 1K facts: ~400 calls (well within hourly limits)


Topic → Layer 1 API Routing Table

Quick reference for which topic slugs route to which image API.

Topic Slug(s)APIEndpoint Pattern
science, space, physics-spaceNASA APODapi.nasa.gov/planetary/apod
art, architectureMet Museumcollectionapi.metmuseum.org/public/collection/v1/search
artArt Institute Chicagoapi.artic.edu/api/v1/artworks/search
entertainment, tv, peopleTMDbapi.themoviedb.org/3/search/multi
animalsiNaturalistapi.inaturalist.org/v1/taxa
cookingTheMealDBthemealdb.com/api/json/v1/1/search.php
food-beverageTheCocktailDBthecocktaildb.com/api/json/v1/1/search.php
gamesRAWGapi.rawg.io/api/games
publishingOpen Librarycovers.openlibrary.org/b/isbn/{isbn}-L.jpg
sports, *-sports-*TheSportsDBthesportsdb.com/api/v1/json/3/searchteams.php
(all others)Skip to Layer 2Wikimedia Commons search

Open Questions

  1. fill_the_gap / statement_blank detection — How reliable is string matching factTitle in styleData? May need fuzzy matching or AI tagging at generation time.
  2. Image caching — Should resolved challenge images be copied to R2 to avoid hotlinking? (Unsplash/Pexels TOS require attribution but allow direct linking; Wikimedia prefers local caching.)
  3. progressive_image_reveal implementation — This style needs not just an image but a masking strategy (grid reveal, blur-to-sharp, jigsaw). Separate design doc needed.
  4. AI image generation trigger — Should Cloudflare Workers AI be Layer 2 or Layer 4? If included in the plan, it eliminates Unsplash/Pexels dependency entirely.
  5. Anti-spoiler validation — Should there be a post-check that the resolved image doesn't visually reveal the answer? (e.g., searching "iron lattice" still returns Eiffel Tower photos.)

Files to Create / Modify

FileActionPurpose
supabase/migrations/0XXX_challenge_image_columns.sqlCreateAdd image columns to fact_challenge_content
packages/queue/src/index.tsModifyAdd RESOLVE_CHALLENGE_IMAGE message type
apps/worker-ingest/src/handlers/resolve-challenge-image.tsCreateNew handler with challenge-aware cascade
apps/worker-ingest/src/index.tsModifyRegister new handler
apps/worker-facts/src/handlers/generate-challenge-content.tsModifyEnqueue overrides after generation
packages/db/src/drizzle/schema.tsModifyAdd image columns to challenge content table
packages/db/src/drizzle/fact-engine-queries.tsModifyAdd updateChallengeContentImage()
packages/ai/src/challenge-content-rules.tsModifyExport NEEDS_CUSTOM_IMAGE style set