Challenge Image Architecture
Reference for auto-generating challenge-aligned images in Eko's fact challenge system.
Design Decision: Option C (Hybrid Inheritance + Override)
Most challenges inherit the parent fact's image. Only styles where the fact image would spoil the answer get a dedicated image resolution job.
fact_records.imageUrl ──────────────────────────────────────────┐
│
GENERATE_CHALLENGE_CONTENT │
│ │
├─ style needs override? ── NO ── image_status='inherited' ──┤
│ │
└─ YES ── enqueue RESOLVE_CHALLENGE_IMAGE │
│ │
▼ │
challenge-aware cascade │
│ │
├─ found ── image_status='resolved' │
│ │
└─ not found ── image_status='skipped' ─────────┤
│
Display: ─────────┘
challenge.image_url
?? fact.imageUrl
?? topicPlaceholder
Style → Image Strategy Matrix
| Challenge Style | Inherits Fact Image? | Override Reason | Override Frequency |
|---|---|---|---|
multiple_choice | Yes | — | never |
direct_question | Yes | — | never |
free_text | Yes | — | never |
fill_the_gap | Conditional | Gap may be the entity name | ~30% of instances |
statement_blank | Conditional | Blank may be the entity name | ~30% of instances |
reverse_lookup | Always override | Entity IS the answer | 100% |
progressive_image_reveal | Always override | Needs purpose-built visual | 100% |
Override Decision Logic
needsOverride(style, styleData, factTitle):
if style == 'reverse_lookup': return true
if style == 'progressive_image_reveal': return true
if style == 'fill_the_gap':
return styleData.complete_text contains factTitle
if style == 'statement_blank':
return styleData.statement contains factTitle
return false
Estimated override volume: ~33% of all generated challenges (2 of 6 styles always, 2 conditionally).
Image Source Cascade
Layer 0 — Inherited (no API call)
The fact's existing image, resolved by the RESOLVE_IMAGE handler at extraction time.
- Source:
fact_records.imageUrl(Wikipedia, TheSportsDB, Unsplash, Pexels) - Coverage: ~70-80% of all challenges
- Cost: $0
Layer 1 — Topic-Specific APIs (structured, free, image-included)
APIs that return images as part of their data response. Matched by topic slug.
| Topic Slug | Source | API | Images | Free Tier | Notes |
|---|---|---|---|---|---|
science, space | NASA | api.nasa.gov | APOD, Mars Rover, Earth | 1K req/hr | Spectacular photography |
art | Metropolitan Museum | metmuseum.github.io | 470K+ artworks | Unlimited | Public domain |
art | Art Institute Chicago | api.artic.edu | Hi-res artwork | Unlimited | CC0 licensed |
entertainment, tv | TMDb | api.themoviedb.org | Posters, backdrops, cast | Non-commercial | 9M+ titles |
animals | iNaturalist | api.inaturalist.org | Species photos | Unlimited | Citizen science |
cooking | TheMealDB | themealdb.com/api | Recipe photos | Unlimited | No key (test key: 1) |
food-beverage | TheCocktailDB | thecocktaildb.com/api | Cocktail photos | Unlimited | No key (test key: 1) |
games | RAWG | rawg.io/api | Screenshots, covers | 20K req/mo | 500K+ games |
publishing | Open Library | covers.openlibrary.org | Book covers | Unlimited | By ISBN/OLID |
sports | TheSportsDB | thesportsdb.com | Badges, logos, banners | Unlimited | Already in fact cascade |
Search strategy: Use fact metadata (topic slug, schema keys, entity identifiers) to query the relevant API. Return the first image that is NOT the same as the fact's entity image.
Layer 2 — Wikimedia Commons (broad coverage, free)
GET https://commons.wikimedia.org/w/api.php
?action=query
&generator=search
&gsrsearch={context_keywords}
&prop=imageinfo
&iiprop=url|extmetadata
&iiurlwidth=800
&format=json
- Coverage: 100M+ freely licensed images
- Key advantage: Search by topic context, not entity name
- Search query: Derived from
setup_textkeywords minus answer-revealing terms - Attribution: License metadata in
extmetadataresponse field - No API key required
- Rate limit: Be polite; use
User-Agent: Eko/1.0header
Layer 3 — Stock Photo APIs (context-aware search)
Same providers as the fact image cascade, but with a different search query.
| Provider | Key Required | Free Tier | Rate Limit |
|---|---|---|---|
| Unsplash | UNSPLASH_ACCESS_KEY | 50 req/hr (demo) / 5K req/hr (prod) | Per-hour |
| Pexels | PEXELS_API_KEY | 200 req/hr, 20K req/mo | Per-hour + monthly |
Query construction:
buildChallengeImageQuery(setupText, topicSlug, correctAnswer):
1. Extract nouns/adjectives from setupText
2. Remove terms that match correctAnswer (anti-spoiler)
3. Prepend topicSlug as context anchor
4. Return: "{topicSlug} {filtered_keywords}"
Example:
setupText: "This Parisian iron lattice tower was completed in 1889"
topicSlug: "architecture"
correctAnswer: "Eiffel Tower"
query: "architecture Parisian iron lattice 1889"
Layer 4 — AI-Generated (future)
Not currently configured. Architecture slot reserved.
| Provider | Env Var | Cost/Image | Best For |
|---|---|---|---|
| Cloudflare Workers AI (SDXL) | CLOUDFLARE_ACCOUNT_ID (exists) | ~$0 (included) | General illustration |
| Flux (via fal.ai) | FAL_API_KEY | ~$0.003 | High-quality scenes |
| DALL-E 3 | OPENAI_API_KEY (exists) | ~$0.04 | Premium visual puzzles |
Primary use case: progressive_image_reveal — generating visual puzzles that can be progressively unmasked.
Cloudflare note: Eko already has CLOUDFLARE_ACCOUNT_ID for Stream. Workers AI includes Stable Diffusion XL on paid plans. Images could store directly to the existing R2 bucket via packages/r2.
Schema Changes
fact_challenge_content — New Columns
ALTER TABLE fact_challenge_content
ADD COLUMN image_url TEXT,
ADD COLUMN image_source TEXT,
ADD COLUMN image_attribution TEXT,
ADD COLUMN image_status TEXT DEFAULT 'inherited'
CHECK (image_status IN ('inherited', 'pending', 'resolved', 'skipped'));
| Column | Type | Default | Purpose |
|---|---|---|---|
image_url | TEXT | NULL | Override image URL; NULL = use fact's image |
image_source | TEXT | NULL | Provider name (wikimedia, unsplash, nasa, etc.) |
image_attribution | TEXT | NULL | Credit line for display |
image_status | TEXT | 'inherited' | Lifecycle state of image resolution |
image_status Lifecycle
'inherited' → Challenge uses fact's image (no override needed)
'pending' → Override requested, RESOLVE_CHALLENGE_IMAGE enqueued
'resolved' → Override image found and stored
'skipped' → Override attempted, no suitable image found; falls back to fact image
Queue Message Type
RESOLVE_CHALLENGE_IMAGE
interface ResolveChallengeImageMessage {
type: 'RESOLVE_CHALLENGE_IMAGE'
payload: {
challenge_content_id: string // UUID → fact_challenge_content.id
fact_record_id: string // UUID → parent fact for context
challenge_style: ChallengeStyle
setup_text: string // Challenge framing (for query construction)
correct_answer: string // To EXCLUDE from image search (anti-spoiler)
topic_slug: string // For topic-specific API routing
topic_path: string // Full path for subcategory matching
fact_title: string // Fact title (to avoid duplicate of fact image)
}
}
Consumer: worker-ingest (co-located with existing RESOLVE_IMAGE handler)
Max attempts: 3 (standard queue retry with exponential backoff)
Pipeline Integration
Where RESOLVE_CHALLENGE_IMAGE Gets Enqueued
In generate-challenge-content.ts, after successful upsert:
for each generated challenge:
if needsOverride(style, styleData, factTitle):
set image_status = 'pending'
enqueue RESOLVE_CHALLENGE_IMAGE
else:
set image_status = 'inherited'
Handler Cascade (in worker-ingest)
processResolveChallengeImage(message):
1. Load challenge content + parent fact
2. If challenge already has image_url → skip (idempotent)
3. Build context-aware search query (anti-spoiler)
4. Run cascade:
a. Topic API (if topic_slug has mapped source)
b. Wikimedia Commons (context keyword search)
c. Unsplash (filtered query)
d. Pexels (filtered query)
5. If result AND result.url != fact.imageUrl:
→ update challenge: image_url, image_source, image_attribution, image_status='resolved'
Else:
→ update challenge: image_status='skipped'
Display Logic (UI)
function getChallengeImageUrl(
challenge: FactChallengeContent,
fact: FactRecord,
topic: TopicCategory
): string {
return challenge.imageUrl // Challenge-specific override
?? fact.imageUrl // Parent fact image
?? getTopicPlaceholder(topic) // Branded topic fallback
}
Cost & Rate Limit Estimates
Per 1,000 Facts Processed
| Step | Volume | API Calls | Cost |
|---|---|---|---|
| Challenges generated | ~6,000 (6 styles × 1K facts) | 0 (already happening) | ~$0.30 AI |
| Inherited (no override) | ~4,000 (67%) | 0 | $0 |
| Overrides needed | ~2,000 (33%) | 2,000 max | $0 |
| Layer 1 hits (topic API) | ~800 (40% of overrides) | 800 | $0 (free APIs) |
| Layer 2 hits (Wikimedia) | ~700 (35% of remaining) | 700 | $0 |
| Layer 3 hits (Unsplash/Pexels) | ~400 (remaining) | 400 | $0 |
| Layer 4 hits (AI-gen, future) | 0 (not active) | 0 | $0 |
| No image found | ~100 (5%) | — | — |
Total additional API calls per 1K facts: ~1,900 Total additional cost: $0 (all free-tier APIs) Unsplash/Pexels budget per 1K facts: ~400 calls (well within hourly limits)
Topic → Layer 1 API Routing Table
Quick reference for which topic slugs route to which image API.
| Topic Slug(s) | API | Endpoint Pattern |
|---|---|---|
science, space, physics-space | NASA APOD | api.nasa.gov/planetary/apod |
art, architecture | Met Museum | collectionapi.metmuseum.org/public/collection/v1/search |
art | Art Institute Chicago | api.artic.edu/api/v1/artworks/search |
entertainment, tv, people | TMDb | api.themoviedb.org/3/search/multi |
animals | iNaturalist | api.inaturalist.org/v1/taxa |
cooking | TheMealDB | themealdb.com/api/json/v1/1/search.php |
food-beverage | TheCocktailDB | thecocktaildb.com/api/json/v1/1/search.php |
games | RAWG | api.rawg.io/api/games |
publishing | Open Library | covers.openlibrary.org/b/isbn/{isbn}-L.jpg |
sports, *-sports-* | TheSportsDB | thesportsdb.com/api/v1/json/3/searchteams.php |
| (all others) | Skip to Layer 2 | Wikimedia Commons search |
Open Questions
- fill_the_gap / statement_blank detection — How reliable is string matching
factTitleinstyleData? May need fuzzy matching or AI tagging at generation time. - Image caching — Should resolved challenge images be copied to R2 to avoid hotlinking? (Unsplash/Pexels TOS require attribution but allow direct linking; Wikimedia prefers local caching.)
- progressive_image_reveal implementation — This style needs not just an image but a masking strategy (grid reveal, blur-to-sharp, jigsaw). Separate design doc needed.
- AI image generation trigger — Should Cloudflare Workers AI be Layer 2 or Layer 4? If included in the plan, it eliminates Unsplash/Pexels dependency entirely.
- Anti-spoiler validation — Should there be a post-check that the resolved image doesn't visually reveal the answer? (e.g., searching "iron lattice" still returns Eiffel Tower photos.)
Files to Create / Modify
| File | Action | Purpose |
|---|---|---|
supabase/migrations/0XXX_challenge_image_columns.sql | Create | Add image columns to fact_challenge_content |
packages/queue/src/index.ts | Modify | Add RESOLVE_CHALLENGE_IMAGE message type |
apps/worker-ingest/src/handlers/resolve-challenge-image.ts | Create | New handler with challenge-aware cascade |
apps/worker-ingest/src/index.ts | Modify | Register new handler |
apps/worker-facts/src/handlers/generate-challenge-content.ts | Modify | Enqueue overrides after generation |
packages/db/src/drizzle/schema.ts | Modify | Add image columns to challenge content table |
packages/db/src/drizzle/fact-engine-queries.ts | Modify | Add updateChallengeContentImage() |
packages/ai/src/challenge-content-rules.ts | Modify | Export NEEDS_CUSTOM_IMAGE style set |
Related Documents
- API Source Catalog — Full list of 67 external APIs
- APP-CONTROL.md — Queue types, worker config, cron scheduling
- SEED.md — Seeding pipeline and cost management