Challenge Content Rules

Rules governing pre-generated challenge content for the Eko fact engine. These rules ensure that every published fact has high-quality, on-brand challenge content across multiple interaction styles.

Source tone specification: docs/marketing/CHALLENGE_TONE.md

System Design Rules (CC)

Rule ID	Rule	Enforcement
CC-001	Every published fact must have challenge content for >= 3 styles	`--audit` phase reports coverage gaps
CC-002	The five-field structure (setup_text, challenge_text, reveal_correct, reveal_wrong, correct_answer) is mandatory for all styles	Zod schema validation at generation time
CC-003	`conversational` and `progressive_image_reveal` styles are exempt from pre-generation	Hardcoded skip list in generation script
CC-004	Fallback to algorithmic generation is allowed only when `fact_challenge_content` rows are absent	Frontend components check for pre-generated content first
CC-005	New facts entering the pipeline via queue must trigger `GENERATE_CHALLENGE_CONTENT` after `IMPORT_FACTS`	Worker handler chain
CC-006	Challenge content is regenerated independently of `fact_records` — changes to one table do not require changes to the other	Separate table architecture
CC-007	Every entity with >= 5 published facts must have a dedicated `topic_categories` row at depth 2 or 3	`materialize-entity-categories.ts --audit`
CC-008	All subcategories defined in `CATEGORY_SPECS` must be materialized as `topic_categories` rows with schema entries	Migration + `--audit` script
CC-009	Queries returning category lists must support depth filtering; feed UI and cron jobs operate at root level only	Code review + integration tests
CC-010	External news API category slugs must resolve through the alias table before fact extraction	`resolveTopicCategory()` fallback chain
CC-011	Unresolvable category slugs must be logged to `unmapped_category_log` for periodic audit	`resolveTopicCategory()` audit logging

CC-001: Minimum Style Coverage

Every fact with status = 'published' must have pre-generated challenge content for at least 3 of the 6 pre-generated styles. The --audit flag on the generation script reports facts that fall below this threshold.

CC-002: Five-Field Structure

All challenge styles must include:

setup_text — freely shared context (the "offer knowledge" layer)
challenge_text — the invitation to engage (the "invite them in" layer)
reveal_correct — celebration of knowing (the "shared victory" reveal)
reveal_wrong — teaching moment (the "discovery" reveal)
correct_answer — rich narrative answer for animated streaming display (the "storytelling payoff")

The first four map directly to the Three Layers from the Challenge Tone Specification. The correct_answer is the fourth layer: a multi-sentence, engaging narrative designed for animated streaming display on challenge detail pages.

CC-003: Exempt Styles

The following styles are NOT pre-generated:

conversational — generated in real-time during multi-turn dialogue
progressive_image_reveal — requires runtime image processing

These are hardcoded in PRE_GENERATED_STYLES in packages/ai/src/challenge-content-rules.ts.

CC-004: Fallback Priority

Frontend components must check fact_challenge_content for pre-generated rows before falling back to algorithmic generation. Pre-generated content is always preferred because it has been validated against quality rules.

CC-005: Pipeline Integration

When the worker processes a new fact:

IMPORT_FACTS — fact record is created
GENERATE_CHALLENGE_CONTENT — challenge content is generated for all pre-generated styles

This ensures no fact reaches publication without challenge content.

CC-006: Table Independence

The fact_challenge_content table is independent of fact_records. Regenerating challenge content does not require modifying the fact record, and vice versa. This allows challenge content to be refreshed without affecting fact integrity.

CC-007: Leaf-Level Entity Categories

Every entity (from seed_entry_queue) with >= 5 published facts should have a dedicated topic_categories row at depth 2 or 3. This enables fine-grained content organization and future subcategory browsing.

Entity categories must have a valid parent_id pointing to a mid-level subcategory (depth 1 or 2)
Entity category path must follow the hierarchical slug convention: root/subcategory/entity-slug
Entities below the 5-fact threshold remain classified under their parent subcategory or root category

Depth-3 impact: With the taxonomy expanded to depth 3, entity categories at depth 3 inherit their parent's challenge format links from the depth-2 subcategory above them. Entities should be placed at depth 3 when their parent depth-2 category exists.

Enforcement: materialize-entity-categories.ts --audit reports entities that meet the threshold but lack a dedicated category row.

CC-008: Mid-Level Subcategory Materialization

All subcategories defined in CATEGORY_SPECS (in scripts/seed/generate-curated-entries.ts) must be materialized as topic_categories rows in the database. This bridges the gap between the code-defined taxonomy and the database-driven category system.

Subcategories inherit challenge format links from their parent category via challenge_format_topics
Each subcategory must have a general_fact schema entry in fact_record_schemas
Subcategory depth values must accurately reflect their position in the hierarchy (1 for direct children of root, 2 for grandchildren, 3 for leaf)

Depth-3 impact: Depth-3 categories inherit challenge format eligibility from their depth-2 parent via the challenge_format_topics table. When a new depth-3 category is added, its parent's format links apply automatically.

Enforcement: Migration creates the rows; materialize-entity-categories.ts --audit verifies completeness.

CC-009: Taxonomy Query Safeguards

Queries returning category lists must support depth filtering via a maxDepth parameter to prevent unintended explosion of results as the taxonomy grows.

getActiveTopicCategories() and getActiveTopicCategoriesWithSchemas() accept an optional maxDepth parameter
Feed UI category pills must show only depth-0 (root) categories to avoid overwhelming the filter bar
Cron and evergreen generation must operate at root level (maxDepth: 0) to prevent per-subcategory quota explosion
When maxDepth is omitted, all depths are returned for backwards compatibility

Enforcement: Code review and integration tests verify that existing consumers pass maxDepth: 0 where appropriate.

CC-010: Category Alias Normalization

External news API providers use their own category taxonomies (e.g., NewsAPI uses business, GNews uses nation) that do not match our internal topic_categories slugs. The topic_category_aliases table maps external slugs to internal category IDs.

resolveTopicCategory() must be used instead of getTopicCategoryBySlug() in all ingestion paths
Resolution order: (1) exact slug match, (2) provider-specific alias, (3) universal alias
The alias table must be re-seeded when new root categories are added to improve mapping specificity
Provider-specific aliases override universal aliases for the same external slug

Enforcement: resolveTopicCategory() query function implements the fallback chain. The extract-facts handler uses it for topic resolution.

CC-011: Unmapped Category Audit

When no alias mapping exists for an external category slug, the system must log it to unmapped_category_log instead of silently dropping the story. This enables proactive alias creation.

Logging is fire-and-forget (does not block or fail fact extraction)
Unmapped slugs can be reviewed via database queries on unmapped_category_log
High-frequency unmapped slugs should trigger alias creation within one business day

Enforcement: resolveTopicCategory() logs on null resolution. Periodic audit via unmapped_category_log table queries.

Content Quality Rules (CQ)

Rule ID	Rule	Enforcement
CQ-001	`setup_text` must be 2-4 sentences, multiline, containing at least one specific detail (name, date, number, place)	Post-generation validation regex + length check
CQ-002	`challenge_text` must be phrased as an invitation, not a test. Must contain second-person address ("you")	Post-generation validation
CQ-003	`reveal_correct` must celebrate the user's knowledge. Never just "Correct." Must be 1-3 sentences with an additional teaching detail	Length + pattern check
CQ-004	`reveal_wrong` must teach, not punish. Never use "Wrong!" or "Incorrect." Must include the correct answer and context	Anti-pattern regex check
CQ-005	`multiple_choice` distractors must be plausible — no "N/A" or "Unknown" filler options	Validated in Zod schema
CQ-006	Anti-pattern words are banned in all fields: "Trivia", "Quiz", "Algorithm", "Content", "Correct/Incorrect" (as binary labels)	Regex scan at generation + upload time
CQ-007	`free_text` prompts must be open-ended and thought-provoking, not recall-based ("Name the X" is banned for free_text)	Pattern check in validation
CQ-008	`correct_answer` must be 3-6 sentences minimum (100+ chars), narrative and engaging. Designed for animated streaming display. Never short, bland, or encyclopedic. Must include at least one detail beyond the raw factual answer	Length check + post-generation validation
CQ-009	`fill_the_gap` style_data must contain `complete_text` (string) and `answer` (string)	`validateChallengeContent()` style_data check
CQ-010	`direct_question` style_data must contain `expected_answer` (string)	`validateChallengeContent()` style_data check
CQ-011	`statement_blank` style_data must contain `statement`, `complete_statement`, and `answer` (all strings)	`validateChallengeContent()` style_data check
CQ-012	`reverse_lookup` style_data must contain `answer` (string)	`validateChallengeContent()` style_data check
CQ-013	`free_text` style_data must contain `key_concepts` (non-empty array) and `sample_answer` (string)	`validateChallengeContent()` style_data check

Note: The code in validateChallengeContent() labels these as CQ-007 through CQ-011 internally. The docs use CQ-009 through CQ-013 to avoid conflicts with the content quality rules above. Both reference the same validation logic.

CQ-001: Setup Text Quality

The setup_text field is the "offer knowledge freely" layer. It must:

Be 2-4 sentences (minimum 50 characters)
Contain at least one specific detail: a name, date, number, or place
Follow the Hook -> Story -> Connection structure from the tone spec

Good: "The Berlin Wall didn't fall because of a grand political plan. It fell because of a confused press conference. On November 9, 1989, a spokesperson was asked when the new travel rules would take effect."

Bad: "The Berlin Wall was a barrier. It eventually came down." (no specific details, too vague)

CQ-002: Challenge Text Voice

The challenge_text field is the "invite them in" layer. It must:

Be phrased as an invitation, not a test
Contain second-person address ("you", "your")
Minimum 30 characters

Good: "That press conference changed history in hours. But how long had the Wall actually stood?"

Bad: "What year did the Berlin Wall fall?" (test phrasing, no invitation)

CQ-003: Correct Reveal Quality

The reveal_correct field celebrates knowledge. It must:

Be 1-3 sentences (minimum 30 characters)
Include an additional teaching detail beyond "Correct"
Never be just "Correct." or "That's right."

Good: "Twenty-eight years — from August 1961 to November 1989. A generation grew up knowing nothing but a divided city."

Bad: "Correct!" (no teaching detail)

CQ-004: Wrong Reveal Quality

The reveal_wrong field teaches without punishing. It must:

Be minimum 30 characters
Never use "Wrong!", "Incorrect!", or similar binary labels
Include the correct answer and context

Good: "Twenty-eight years. Built in a single night in August 1961, it stood until that accidental press conference in 1989."

Bad: "Wrong! The answer was 28 years." (punishing tone, no context)

CQ-005: Multiple Choice Plausibility

For multiple_choice style, the style_data.options array must:

Contain exactly 4 options
Have exactly 1 correct option
Not include filler like "N/A", "None of the above", "All of the above", or "Unknown"

CQ-006: Banned Patterns

The following words and patterns are banned in all challenge content fields:

"Trivia", "Quiz" — kills the Eko voice
"Algorithm", "Content" — technical jargon
"Correct!" / "Incorrect!" — binary labels (see tone spec Anti-Patterns)
"Wrong!" — punishing tone
Quiz-show patterns like "Question 1 of 10"
Condescending patterns like "Easy one", "Even a child"

See BANNED_PATTERNS in packages/ai/src/challenge-content-rules.ts for the full regex list.

CQ-007: Free Text Prompt Quality

For free_text style, the challenge_text must:

Be open-ended and thought-provoking
Push beyond recall into understanding
Not use "Name the X" patterns (that is recall, not free-text)

Good: "In your own words, why does food brown when you sear it but not when you boil it?"

Bad: "Name the chemical reaction." (recall-based, not open-ended)

CQ-008: Correct Answer Quality

The correct_answer field is the narrative reveal — the rich, multi-sentence explanation designed for animated streaming display on challenge detail pages. It must:

Be 3-6 sentences minimum (100+ characters)
Start with the factual answer, then expand into significance and context
Include at least one detail that goes beyond the raw factual answer
Use second-person address where natural
Never be short, bland, or encyclopedic — this is the storytelling payoff
Stand on its own as a fascinating mini-story

Good: "Twenty-eight years. The Berlin Wall stood for twenty-eight years, from August 1961 to November 1989 — an entire generation grew up knowing nothing but a divided city. What makes this number even more striking is how it came down: not through a planned demolition, but because a spokesperson at a press conference accidentally announced that new travel rules were 'effective immediately.' Within hours, thousands of East Berliners flooded the checkpoints, and the guards — with no orders — simply stepped aside."

Bad: "The Berlin Wall stood for 28 years." (too short, no narrative, no engagement)

CQ-009: Fill the Gap Style Data

For fill_the_gap style, the style_data must contain:

complete_text — the full sentence with the gap filled in
answer — the word or phrase that fills the gap

CQ-010: Direct Question Style Data

For direct_question style, the style_data must contain:

expected_answer — the expected answer for validation

CQ-011: Statement Blank Style Data

For statement_blank style, the style_data must contain:

statement — the statement with a blank
complete_statement — the full statement with the blank filled
answer — the word or phrase that fills the blank

CQ-012: Reverse Lookup Style Data

For reverse_lookup style, the style_data must contain:

answer — the entity or concept being described

CQ-013: Free Text Style Data

For free_text style, the style_data must contain:

key_concepts — non-empty array of concepts the answer should touch on
sample_answer — an example high-quality answer for reference

Validation

Challenge content is validated at two points:

Generation time — validateChallengeContent() in packages/ai/src/challenge-content-rules.ts
Upload time — Zod schema validation before database insert

Both checks enforce the same rules to prevent invalid content from reaching the database.

File	Purpose
`packages/ai/src/challenge-content-rules.ts`	Constants, validation, banned patterns
`packages/ai/src/challenge-content.ts`	AI generation function
`scripts/seed/generate-challenge-content.ts`	Backfill script
`docs/marketing/CHALLENGE_TONE.md`	Source tone specification
`scripts/seed/materialize-entity-categories.ts`	Entity classification and leaf category insertion
`unmapped_category_log` (DB table)	Unmapped category audit data (queried directly)
`packages/db/src/drizzle/fact-engine-queries.ts`	`resolveTopicCategory()` alias fallback chain
`supabase/migrations/0124_seed_missing_root_categories.sql`	Seed 25 missing depth-0 root categories
`supabase/migrations/0126_add_topic_category_aliases.sql`	Category alias table and unmapped category log

#Challenge Content Rules

#System Design Rules (CC)

#CC-001: Minimum Style Coverage

#CC-002: Five-Field Structure

#CC-003: Exempt Styles

#CC-004: Fallback Priority

#CC-005: Pipeline Integration

#CC-006: Table Independence

#CC-007: Leaf-Level Entity Categories

#CC-008: Mid-Level Subcategory Materialization

#CC-009: Taxonomy Query Safeguards

#CC-010: Category Alias Normalization

#CC-011: Unmapped Category Audit

#Content Quality Rules (CQ)

#CQ-001: Setup Text Quality

#CQ-002: Challenge Text Voice

#CQ-003: Correct Reveal Quality

#CQ-004: Wrong Reveal Quality

#CQ-005: Multiple Choice Plausibility

#CQ-006: Banned Patterns

#CQ-007: Free Text Prompt Quality

#CQ-008: Correct Answer Quality

#CQ-009: Fill the Gap Style Data

#CQ-010: Direct Question Style Data

#CQ-011: Statement Blank Style Data

#CQ-012: Reverse Lookup Style Data

#CQ-013: Free Text Style Data

#Validation

#Related Files