Challenge Content Rules

Rules governing pre-generated challenge content for the Eko fact engine. These rules ensure that every published fact has high-quality, on-brand challenge content across multiple interaction styles.

Source tone specification: docs/marketing/CHALLENGE_TONE.md


System Design Rules (CC)

Rule IDRuleEnforcement
CC-001Every published fact must have challenge content for >= 3 styles--audit phase reports coverage gaps
CC-002The five-field structure (setup_text, challenge_text, reveal_correct, reveal_wrong, correct_answer) is mandatory for all stylesZod schema validation at generation time
CC-003conversational and progressive_image_reveal styles are exempt from pre-generationHardcoded skip list in generation script
CC-004Fallback to algorithmic generation is allowed only when fact_challenge_content rows are absentFrontend components check for pre-generated content first
CC-005New facts entering the pipeline via queue must trigger GENERATE_CHALLENGE_CONTENT after IMPORT_FACTSWorker handler chain
CC-006Challenge content is regenerated independently of fact_records — changes to one table do not require changes to the otherSeparate table architecture
CC-007Every entity with >= 5 published facts must have a dedicated topic_categories row at depth 2 or 3materialize-entity-categories.ts --audit
CC-008All subcategories defined in CATEGORY_SPECS must be materialized as topic_categories rows with schema entriesMigration + --audit script
CC-009Queries returning category lists must support depth filtering; feed UI and cron jobs operate at root level onlyCode review + integration tests
CC-010External news API category slugs must resolve through the alias table before fact extractionresolveTopicCategory() fallback chain
CC-011Unresolvable category slugs must be logged to unmapped_category_log for periodic auditresolveTopicCategory() audit logging

CC-001: Minimum Style Coverage

Every fact with status = 'published' must have pre-generated challenge content for at least 3 of the 6 pre-generated styles. The --audit flag on the generation script reports facts that fall below this threshold.

CC-002: Five-Field Structure

All challenge styles must include:

  • setup_text — freely shared context (the "offer knowledge" layer)
  • challenge_text — the invitation to engage (the "invite them in" layer)
  • reveal_correct — celebration of knowing (the "shared victory" reveal)
  • reveal_wrong — teaching moment (the "discovery" reveal)
  • correct_answer — rich narrative answer for animated streaming display (the "storytelling payoff")

The first four map directly to the Three Layers from the Challenge Tone Specification. The correct_answer is the fourth layer: a multi-sentence, engaging narrative designed for animated streaming display on challenge detail pages.

CC-003: Exempt Styles

The following styles are NOT pre-generated:

  • conversational — generated in real-time during multi-turn dialogue
  • progressive_image_reveal — requires runtime image processing

These are hardcoded in PRE_GENERATED_STYLES in packages/ai/src/challenge-content-rules.ts.

CC-004: Fallback Priority

Frontend components must check fact_challenge_content for pre-generated rows before falling back to algorithmic generation. Pre-generated content is always preferred because it has been validated against quality rules.

CC-005: Pipeline Integration

When the worker processes a new fact:

  1. IMPORT_FACTS — fact record is created
  2. GENERATE_CHALLENGE_CONTENT — challenge content is generated for all pre-generated styles

This ensures no fact reaches publication without challenge content.

CC-006: Table Independence

The fact_challenge_content table is independent of fact_records. Regenerating challenge content does not require modifying the fact record, and vice versa. This allows challenge content to be refreshed without affecting fact integrity.

CC-007: Leaf-Level Entity Categories

Every entity (from seed_entry_queue) with >= 5 published facts should have a dedicated topic_categories row at depth 2 or 3. This enables fine-grained content organization and future subcategory browsing.

  • Entity categories must have a valid parent_id pointing to a mid-level subcategory (depth 1 or 2)
  • Entity category path must follow the hierarchical slug convention: root/subcategory/entity-slug
  • Entities below the 5-fact threshold remain classified under their parent subcategory or root category

Depth-3 impact: With the taxonomy expanded to depth 3, entity categories at depth 3 inherit their parent's challenge format links from the depth-2 subcategory above them. Entities should be placed at depth 3 when their parent depth-2 category exists.

Enforcement: materialize-entity-categories.ts --audit reports entities that meet the threshold but lack a dedicated category row.

CC-008: Mid-Level Subcategory Materialization

All subcategories defined in CATEGORY_SPECS (in scripts/seed/generate-curated-entries.ts) must be materialized as topic_categories rows in the database. This bridges the gap between the code-defined taxonomy and the database-driven category system.

  • Subcategories inherit challenge format links from their parent category via challenge_format_topics
  • Each subcategory must have a general_fact schema entry in fact_record_schemas
  • Subcategory depth values must accurately reflect their position in the hierarchy (1 for direct children of root, 2 for grandchildren, 3 for leaf)

Depth-3 impact: Depth-3 categories inherit challenge format eligibility from their depth-2 parent via the challenge_format_topics table. When a new depth-3 category is added, its parent's format links apply automatically.

Enforcement: Migration creates the rows; materialize-entity-categories.ts --audit verifies completeness.

CC-009: Taxonomy Query Safeguards

Queries returning category lists must support depth filtering via a maxDepth parameter to prevent unintended explosion of results as the taxonomy grows.

  • getActiveTopicCategories() and getActiveTopicCategoriesWithSchemas() accept an optional maxDepth parameter
  • Feed UI category pills must show only depth-0 (root) categories to avoid overwhelming the filter bar
  • Cron and evergreen generation must operate at root level (maxDepth: 0) to prevent per-subcategory quota explosion
  • When maxDepth is omitted, all depths are returned for backwards compatibility

Enforcement: Code review and integration tests verify that existing consumers pass maxDepth: 0 where appropriate.

CC-010: Category Alias Normalization

External news API providers use their own category taxonomies (e.g., NewsAPI uses business, GNews uses nation) that do not match our internal topic_categories slugs. The topic_category_aliases table maps external slugs to internal category IDs.

  • resolveTopicCategory() must be used instead of getTopicCategoryBySlug() in all ingestion paths
  • Resolution order: (1) exact slug match, (2) provider-specific alias, (3) universal alias
  • The alias table must be re-seeded when new root categories are added to improve mapping specificity
  • Provider-specific aliases override universal aliases for the same external slug

Enforcement: resolveTopicCategory() query function implements the fallback chain. The extract-facts handler uses it for topic resolution.

CC-011: Unmapped Category Audit

When no alias mapping exists for an external category slug, the system must log it to unmapped_category_log instead of silently dropping the story. This enables proactive alias creation.

  • Logging is fire-and-forget (does not block or fail fact extraction)
  • Unmapped slugs can be reviewed via database queries on unmapped_category_log
  • High-frequency unmapped slugs should trigger alias creation within one business day

Enforcement: resolveTopicCategory() logs on null resolution. Periodic audit via unmapped_category_log table queries.


Content Quality Rules (CQ)

Rule IDRuleEnforcement
CQ-001setup_text must be 2-4 sentences, multiline, containing at least one specific detail (name, date, number, place)Post-generation validation regex + length check
CQ-002challenge_text must be phrased as an invitation, not a test. Must contain second-person address ("you")Post-generation validation
CQ-003reveal_correct must celebrate the user's knowledge. Never just "Correct." Must be 1-3 sentences with an additional teaching detailLength + pattern check
CQ-004reveal_wrong must teach, not punish. Never use "Wrong!" or "Incorrect." Must include the correct answer and contextAnti-pattern regex check
CQ-005multiple_choice distractors must be plausible — no "N/A" or "Unknown" filler optionsValidated in Zod schema
CQ-006Anti-pattern words are banned in all fields: "Trivia", "Quiz", "Algorithm", "Content", "Correct/Incorrect" (as binary labels)Regex scan at generation + upload time
CQ-007free_text prompts must be open-ended and thought-provoking, not recall-based ("Name the X" is banned for free_text)Pattern check in validation
CQ-008correct_answer must be 3-6 sentences minimum (100+ chars), narrative and engaging. Designed for animated streaming display. Never short, bland, or encyclopedic. Must include at least one detail beyond the raw factual answerLength check + post-generation validation
CQ-009fill_the_gap style_data must contain complete_text (string) and answer (string)validateChallengeContent() style_data check
CQ-010direct_question style_data must contain expected_answer (string)validateChallengeContent() style_data check
CQ-011statement_blank style_data must contain statement, complete_statement, and answer (all strings)validateChallengeContent() style_data check
CQ-012reverse_lookup style_data must contain answer (string)validateChallengeContent() style_data check
CQ-013free_text style_data must contain key_concepts (non-empty array) and sample_answer (string)validateChallengeContent() style_data check

Note: The code in validateChallengeContent() labels these as CQ-007 through CQ-011 internally. The docs use CQ-009 through CQ-013 to avoid conflicts with the content quality rules above. Both reference the same validation logic.

CQ-001: Setup Text Quality

The setup_text field is the "offer knowledge freely" layer. It must:

  • Be 2-4 sentences (minimum 50 characters)
  • Contain at least one specific detail: a name, date, number, or place
  • Follow the Hook -> Story -> Connection structure from the tone spec

Good: "The Berlin Wall didn't fall because of a grand political plan. It fell because of a confused press conference. On November 9, 1989, a spokesperson was asked when the new travel rules would take effect."

Bad: "The Berlin Wall was a barrier. It eventually came down." (no specific details, too vague)

CQ-002: Challenge Text Voice

The challenge_text field is the "invite them in" layer. It must:

  • Be phrased as an invitation, not a test
  • Contain second-person address ("you", "your")
  • Minimum 30 characters

Good: "That press conference changed history in hours. But how long had the Wall actually stood?"

Bad: "What year did the Berlin Wall fall?" (test phrasing, no invitation)

CQ-003: Correct Reveal Quality

The reveal_correct field celebrates knowledge. It must:

  • Be 1-3 sentences (minimum 30 characters)
  • Include an additional teaching detail beyond "Correct"
  • Never be just "Correct." or "That's right."

Good: "Twenty-eight years — from August 1961 to November 1989. A generation grew up knowing nothing but a divided city."

Bad: "Correct!" (no teaching detail)

CQ-004: Wrong Reveal Quality

The reveal_wrong field teaches without punishing. It must:

  • Be minimum 30 characters
  • Never use "Wrong!", "Incorrect!", or similar binary labels
  • Include the correct answer and context

Good: "Twenty-eight years. Built in a single night in August 1961, it stood until that accidental press conference in 1989."

Bad: "Wrong! The answer was 28 years." (punishing tone, no context)

CQ-005: Multiple Choice Plausibility

For multiple_choice style, the style_data.options array must:

  • Contain exactly 4 options
  • Have exactly 1 correct option
  • Not include filler like "N/A", "None of the above", "All of the above", or "Unknown"

CQ-006: Banned Patterns

The following words and patterns are banned in all challenge content fields:

  • "Trivia", "Quiz" — kills the Eko voice
  • "Algorithm", "Content" — technical jargon
  • "Correct!" / "Incorrect!" — binary labels (see tone spec Anti-Patterns)
  • "Wrong!" — punishing tone
  • Quiz-show patterns like "Question 1 of 10"
  • Condescending patterns like "Easy one", "Even a child"

See BANNED_PATTERNS in packages/ai/src/challenge-content-rules.ts for the full regex list.

CQ-007: Free Text Prompt Quality

For free_text style, the challenge_text must:

  • Be open-ended and thought-provoking
  • Push beyond recall into understanding
  • Not use "Name the X" patterns (that is recall, not free-text)

Good: "In your own words, why does food brown when you sear it but not when you boil it?"

Bad: "Name the chemical reaction." (recall-based, not open-ended)

CQ-008: Correct Answer Quality

The correct_answer field is the narrative reveal — the rich, multi-sentence explanation designed for animated streaming display on challenge detail pages. It must:

  • Be 3-6 sentences minimum (100+ characters)
  • Start with the factual answer, then expand into significance and context
  • Include at least one detail that goes beyond the raw factual answer
  • Use second-person address where natural
  • Never be short, bland, or encyclopedic — this is the storytelling payoff
  • Stand on its own as a fascinating mini-story

Good: "Twenty-eight years. The Berlin Wall stood for twenty-eight years, from August 1961 to November 1989 — an entire generation grew up knowing nothing but a divided city. What makes this number even more striking is how it came down: not through a planned demolition, but because a spokesperson at a press conference accidentally announced that new travel rules were 'effective immediately.' Within hours, thousands of East Berliners flooded the checkpoints, and the guards — with no orders — simply stepped aside."

Bad: "The Berlin Wall stood for 28 years." (too short, no narrative, no engagement)

CQ-009: Fill the Gap Style Data

For fill_the_gap style, the style_data must contain:

  • complete_text — the full sentence with the gap filled in
  • answer — the word or phrase that fills the gap

CQ-010: Direct Question Style Data

For direct_question style, the style_data must contain:

  • expected_answer — the expected answer for validation

CQ-011: Statement Blank Style Data

For statement_blank style, the style_data must contain:

  • statement — the statement with a blank
  • complete_statement — the full statement with the blank filled
  • answer — the word or phrase that fills the blank

CQ-012: Reverse Lookup Style Data

For reverse_lookup style, the style_data must contain:

  • answer — the entity or concept being described

CQ-013: Free Text Style Data

For free_text style, the style_data must contain:

  • key_concepts — non-empty array of concepts the answer should touch on
  • sample_answer — an example high-quality answer for reference

Validation

Challenge content is validated at two points:

  1. Generation timevalidateChallengeContent() in packages/ai/src/challenge-content-rules.ts
  2. Upload time — Zod schema validation before database insert

Both checks enforce the same rules to prevent invalid content from reaching the database.


FilePurpose
packages/ai/src/challenge-content-rules.tsConstants, validation, banned patterns
packages/ai/src/challenge-content.tsAI generation function
scripts/seed/generate-challenge-content.tsBackfill script
docs/marketing/CHALLENGE_TONE.mdSource tone specification
scripts/seed/materialize-entity-categories.tsEntity classification and leaf category insertion
unmapped_category_log (DB table)Unmapped category audit data (queried directly)
packages/db/src/drizzle/fact-engine-queries.tsresolveTopicCategory() alias fallback chain
supabase/migrations/0124_seed_missing_root_categories.sqlSeed 25 missing depth-0 root categories
supabase/migrations/0126_add_topic_category_aliases.sqlCategory alias table and unmapped category log