Challenge Content Rules
Rules governing pre-generated challenge content for the Eko fact engine. These rules ensure that every published fact has high-quality, on-brand challenge content across multiple interaction styles.
Source tone specification: docs/marketing/CHALLENGE_TONE.md
System Design Rules (CC)
| Rule ID | Rule | Enforcement |
|---|---|---|
| CC-001 | Every published fact must have challenge content for >= 3 styles | --audit phase reports coverage gaps |
| CC-002 | The five-field structure (setup_text, challenge_text, reveal_correct, reveal_wrong, correct_answer) is mandatory for all styles | Zod schema validation at generation time |
| CC-003 | conversational and progressive_image_reveal styles are exempt from pre-generation | Hardcoded skip list in generation script |
| CC-004 | Fallback to algorithmic generation is allowed only when fact_challenge_content rows are absent | Frontend components check for pre-generated content first |
| CC-005 | New facts entering the pipeline via queue must trigger GENERATE_CHALLENGE_CONTENT after IMPORT_FACTS | Worker handler chain |
| CC-006 | Challenge content is regenerated independently of fact_records — changes to one table do not require changes to the other | Separate table architecture |
| CC-007 | Every entity with >= 5 published facts must have a dedicated topic_categories row at depth 2 or 3 | materialize-entity-categories.ts --audit |
| CC-008 | All subcategories defined in CATEGORY_SPECS must be materialized as topic_categories rows with schema entries | Migration + --audit script |
| CC-009 | Queries returning category lists must support depth filtering; feed UI and cron jobs operate at root level only | Code review + integration tests |
| CC-010 | External news API category slugs must resolve through the alias table before fact extraction | resolveTopicCategory() fallback chain |
| CC-011 | Unresolvable category slugs must be logged to unmapped_category_log for periodic audit | resolveTopicCategory() audit logging |
CC-001: Minimum Style Coverage
Every fact with status = 'published' must have pre-generated challenge content for at least 3 of the 6 pre-generated styles. The --audit flag on the generation script reports facts that fall below this threshold.
CC-002: Five-Field Structure
All challenge styles must include:
setup_text— freely shared context (the "offer knowledge" layer)challenge_text— the invitation to engage (the "invite them in" layer)reveal_correct— celebration of knowing (the "shared victory" reveal)reveal_wrong— teaching moment (the "discovery" reveal)correct_answer— rich narrative answer for animated streaming display (the "storytelling payoff")
The first four map directly to the Three Layers from the Challenge Tone Specification. The correct_answer is the fourth layer: a multi-sentence, engaging narrative designed for animated streaming display on challenge detail pages.
CC-003: Exempt Styles
The following styles are NOT pre-generated:
conversational— generated in real-time during multi-turn dialogueprogressive_image_reveal— requires runtime image processing
These are hardcoded in PRE_GENERATED_STYLES in packages/ai/src/challenge-content-rules.ts.
CC-004: Fallback Priority
Frontend components must check fact_challenge_content for pre-generated rows before falling back to algorithmic generation. Pre-generated content is always preferred because it has been validated against quality rules.
CC-005: Pipeline Integration
When the worker processes a new fact:
IMPORT_FACTS— fact record is createdGENERATE_CHALLENGE_CONTENT— challenge content is generated for all pre-generated styles
This ensures no fact reaches publication without challenge content.
CC-006: Table Independence
The fact_challenge_content table is independent of fact_records. Regenerating challenge content does not require modifying the fact record, and vice versa. This allows challenge content to be refreshed without affecting fact integrity.
CC-007: Leaf-Level Entity Categories
Every entity (from seed_entry_queue) with >= 5 published facts should have a dedicated topic_categories row at depth 2 or 3. This enables fine-grained content organization and future subcategory browsing.
- Entity categories must have a valid
parent_idpointing to a mid-level subcategory (depth 1 or 2) - Entity category
pathmust follow the hierarchical slug convention:root/subcategory/entity-slug - Entities below the 5-fact threshold remain classified under their parent subcategory or root category
Depth-3 impact: With the taxonomy expanded to depth 3, entity categories at depth 3 inherit their parent's challenge format links from the depth-2 subcategory above them. Entities should be placed at depth 3 when their parent depth-2 category exists.
Enforcement: materialize-entity-categories.ts --audit reports entities that meet the threshold but lack a dedicated category row.
CC-008: Mid-Level Subcategory Materialization
All subcategories defined in CATEGORY_SPECS (in scripts/seed/generate-curated-entries.ts) must be materialized as topic_categories rows in the database. This bridges the gap between the code-defined taxonomy and the database-driven category system.
- Subcategories inherit challenge format links from their parent category via
challenge_format_topics - Each subcategory must have a
general_factschema entry infact_record_schemas - Subcategory
depthvalues must accurately reflect their position in the hierarchy (1 for direct children of root, 2 for grandchildren, 3 for leaf)
Depth-3 impact: Depth-3 categories inherit challenge format eligibility from their depth-2 parent via the challenge_format_topics table. When a new depth-3 category is added, its parent's format links apply automatically.
Enforcement: Migration creates the rows; materialize-entity-categories.ts --audit verifies completeness.
CC-009: Taxonomy Query Safeguards
Queries returning category lists must support depth filtering via a maxDepth parameter to prevent unintended explosion of results as the taxonomy grows.
getActiveTopicCategories()andgetActiveTopicCategoriesWithSchemas()accept an optionalmaxDepthparameter- Feed UI category pills must show only depth-0 (root) categories to avoid overwhelming the filter bar
- Cron and evergreen generation must operate at root level (
maxDepth: 0) to prevent per-subcategory quota explosion - When
maxDepthis omitted, all depths are returned for backwards compatibility
Enforcement: Code review and integration tests verify that existing consumers pass maxDepth: 0 where appropriate.
CC-010: Category Alias Normalization
External news API providers use their own category taxonomies (e.g., NewsAPI uses business, GNews uses nation) that do not match our internal topic_categories slugs. The topic_category_aliases table maps external slugs to internal category IDs.
resolveTopicCategory()must be used instead ofgetTopicCategoryBySlug()in all ingestion paths- Resolution order: (1) exact slug match, (2) provider-specific alias, (3) universal alias
- The alias table must be re-seeded when new root categories are added to improve mapping specificity
- Provider-specific aliases override universal aliases for the same external slug
Enforcement: resolveTopicCategory() query function implements the fallback chain. The extract-facts handler uses it for topic resolution.
CC-011: Unmapped Category Audit
When no alias mapping exists for an external category slug, the system must log it to unmapped_category_log instead of silently dropping the story. This enables proactive alias creation.
- Logging is fire-and-forget (does not block or fail fact extraction)
- Unmapped slugs can be reviewed via database queries on
unmapped_category_log - High-frequency unmapped slugs should trigger alias creation within one business day
Enforcement: resolveTopicCategory() logs on null resolution. Periodic audit via unmapped_category_log table queries.
Content Quality Rules (CQ)
| Rule ID | Rule | Enforcement |
|---|---|---|
| CQ-001 | setup_text must be 2-4 sentences, multiline, containing at least one specific detail (name, date, number, place) | Post-generation validation regex + length check |
| CQ-002 | challenge_text must be phrased as an invitation, not a test. Must contain second-person address ("you") | Post-generation validation |
| CQ-003 | reveal_correct must celebrate the user's knowledge. Never just "Correct." Must be 1-3 sentences with an additional teaching detail | Length + pattern check |
| CQ-004 | reveal_wrong must teach, not punish. Never use "Wrong!" or "Incorrect." Must include the correct answer and context | Anti-pattern regex check |
| CQ-005 | multiple_choice distractors must be plausible — no "N/A" or "Unknown" filler options | Validated in Zod schema |
| CQ-006 | Anti-pattern words are banned in all fields: "Trivia", "Quiz", "Algorithm", "Content", "Correct/Incorrect" (as binary labels) | Regex scan at generation + upload time |
| CQ-007 | free_text prompts must be open-ended and thought-provoking, not recall-based ("Name the X" is banned for free_text) | Pattern check in validation |
| CQ-008 | correct_answer must be 3-6 sentences minimum (100+ chars), narrative and engaging. Designed for animated streaming display. Never short, bland, or encyclopedic. Must include at least one detail beyond the raw factual answer | Length check + post-generation validation |
| CQ-009 | fill_the_gap style_data must contain complete_text (string) and answer (string) | validateChallengeContent() style_data check |
| CQ-010 | direct_question style_data must contain expected_answer (string) | validateChallengeContent() style_data check |
| CQ-011 | statement_blank style_data must contain statement, complete_statement, and answer (all strings) | validateChallengeContent() style_data check |
| CQ-012 | reverse_lookup style_data must contain answer (string) | validateChallengeContent() style_data check |
| CQ-013 | free_text style_data must contain key_concepts (non-empty array) and sample_answer (string) | validateChallengeContent() style_data check |
Note: The code in
validateChallengeContent()labels these as CQ-007 through CQ-011 internally. The docs use CQ-009 through CQ-013 to avoid conflicts with the content quality rules above. Both reference the same validation logic.
CQ-001: Setup Text Quality
The setup_text field is the "offer knowledge freely" layer. It must:
- Be 2-4 sentences (minimum 50 characters)
- Contain at least one specific detail: a name, date, number, or place
- Follow the Hook -> Story -> Connection structure from the tone spec
Good: "The Berlin Wall didn't fall because of a grand political plan. It fell because of a confused press conference. On November 9, 1989, a spokesperson was asked when the new travel rules would take effect."
Bad: "The Berlin Wall was a barrier. It eventually came down." (no specific details, too vague)
CQ-002: Challenge Text Voice
The challenge_text field is the "invite them in" layer. It must:
- Be phrased as an invitation, not a test
- Contain second-person address ("you", "your")
- Minimum 30 characters
Good: "That press conference changed history in hours. But how long had the Wall actually stood?"
Bad: "What year did the Berlin Wall fall?" (test phrasing, no invitation)
CQ-003: Correct Reveal Quality
The reveal_correct field celebrates knowledge. It must:
- Be 1-3 sentences (minimum 30 characters)
- Include an additional teaching detail beyond "Correct"
- Never be just "Correct." or "That's right."
Good: "Twenty-eight years — from August 1961 to November 1989. A generation grew up knowing nothing but a divided city."
Bad: "Correct!" (no teaching detail)
CQ-004: Wrong Reveal Quality
The reveal_wrong field teaches without punishing. It must:
- Be minimum 30 characters
- Never use "Wrong!", "Incorrect!", or similar binary labels
- Include the correct answer and context
Good: "Twenty-eight years. Built in a single night in August 1961, it stood until that accidental press conference in 1989."
Bad: "Wrong! The answer was 28 years." (punishing tone, no context)
CQ-005: Multiple Choice Plausibility
For multiple_choice style, the style_data.options array must:
- Contain exactly 4 options
- Have exactly 1 correct option
- Not include filler like "N/A", "None of the above", "All of the above", or "Unknown"
CQ-006: Banned Patterns
The following words and patterns are banned in all challenge content fields:
- "Trivia", "Quiz" — kills the Eko voice
- "Algorithm", "Content" — technical jargon
- "Correct!" / "Incorrect!" — binary labels (see tone spec Anti-Patterns)
- "Wrong!" — punishing tone
- Quiz-show patterns like "Question 1 of 10"
- Condescending patterns like "Easy one", "Even a child"
See BANNED_PATTERNS in packages/ai/src/challenge-content-rules.ts for the full regex list.
CQ-007: Free Text Prompt Quality
For free_text style, the challenge_text must:
- Be open-ended and thought-provoking
- Push beyond recall into understanding
- Not use "Name the X" patterns (that is recall, not free-text)
Good: "In your own words, why does food brown when you sear it but not when you boil it?"
Bad: "Name the chemical reaction." (recall-based, not open-ended)
CQ-008: Correct Answer Quality
The correct_answer field is the narrative reveal — the rich, multi-sentence explanation designed for animated streaming display on challenge detail pages. It must:
- Be 3-6 sentences minimum (100+ characters)
- Start with the factual answer, then expand into significance and context
- Include at least one detail that goes beyond the raw factual answer
- Use second-person address where natural
- Never be short, bland, or encyclopedic — this is the storytelling payoff
- Stand on its own as a fascinating mini-story
Good: "Twenty-eight years. The Berlin Wall stood for twenty-eight years, from August 1961 to November 1989 — an entire generation grew up knowing nothing but a divided city. What makes this number even more striking is how it came down: not through a planned demolition, but because a spokesperson at a press conference accidentally announced that new travel rules were 'effective immediately.' Within hours, thousands of East Berliners flooded the checkpoints, and the guards — with no orders — simply stepped aside."
Bad: "The Berlin Wall stood for 28 years." (too short, no narrative, no engagement)
CQ-009: Fill the Gap Style Data
For fill_the_gap style, the style_data must contain:
complete_text— the full sentence with the gap filled inanswer— the word or phrase that fills the gap
CQ-010: Direct Question Style Data
For direct_question style, the style_data must contain:
expected_answer— the expected answer for validation
CQ-011: Statement Blank Style Data
For statement_blank style, the style_data must contain:
statement— the statement with a blankcomplete_statement— the full statement with the blank filledanswer— the word or phrase that fills the blank
CQ-012: Reverse Lookup Style Data
For reverse_lookup style, the style_data must contain:
answer— the entity or concept being described
CQ-013: Free Text Style Data
For free_text style, the style_data must contain:
key_concepts— non-empty array of concepts the answer should touch onsample_answer— an example high-quality answer for reference
Validation
Challenge content is validated at two points:
- Generation time —
validateChallengeContent()inpackages/ai/src/challenge-content-rules.ts - Upload time — Zod schema validation before database insert
Both checks enforce the same rules to prevent invalid content from reaching the database.
Related Files
| File | Purpose |
|---|---|
packages/ai/src/challenge-content-rules.ts | Constants, validation, banned patterns |
packages/ai/src/challenge-content.ts | AI generation function |
scripts/seed/generate-challenge-content.ts | Backfill script |
docs/marketing/CHALLENGE_TONE.md | Source tone specification |
scripts/seed/materialize-entity-categories.ts | Entity classification and leaf category insertion |
unmapped_category_log (DB table) | Unmapped category audit data (queried directly) |
packages/db/src/drizzle/fact-engine-queries.ts | resolveTopicCategory() alias fallback chain |
supabase/migrations/0124_seed_missing_root_categories.sql | Seed 25 missing depth-0 root categories |
supabase/migrations/0126_add_topic_category_aliases.sql | Category alias table and unmapped category log |