Fact & Taxonomy Schema Map
Reference map of all fact and taxonomy schema, rules, and voice-related files and functions with their intended targets.
1. Database Schema (Drizzle ORM)
File: packages/db/src/drizzle/schema.ts
| Table / Enum | Purpose | Targets |
|---|---|---|
topicCategories | Hierarchical taxonomy tree (slug, name, depth 0-3, path, dailyQuota, percentTarget, deprecated_at, replaced_by, deprecation_reason). Depth levels: 0=root, 1=primary, 2=secondary, 3=leaf. MAX_TAXONOMY_DEPTH = 5. | Feed filtering, quota cron, AI extraction routing |
topicCategoryAliases | Maps external provider slugs to canonical topics | resolveTopicCategory() in ingestion pipeline |
unmappedCategoryLog | Logs categories that couldn't be resolved | Admin dashboard analysis |
factRecordSchemas | Per-topic fact shape definitions (factKeys JSONB, cardFormats array) | buildFactObjectSchema() in AI extraction, validation |
factRecords | Core fact storage (facts JSONB, title, context, notabilityScore, status, validation) | Feed API, card detail, challenge generation |
factChallengeContent | Pre-generated challenge text per style/difficulty. Includes challenge_title for display. | Card UI rendering, challenge sessions |
challengeFormats | 8 named formats (big_fan_of, know_a_lot_about, etc.) with knowledgeType, tone | Challenge session selection, format eligibility |
challengeFormatStyles | Format to eligible style mapping | Challenge content generation |
challengeFormatTopics | Format to eligible topic mapping | Topic-scoped format selection |
challengeSessions | Active challenge game state with conversation history | Conversational challenge flow |
cardInteractions | User answer/view/bookmark tracking | Spaced repetition, engagement metrics |
aiCostTracking | Per-model, per-feature daily cost aggregation | Budget monitoring, model routing decisions |
Enums: factRecordStatusEnum, cardFormatEnum, challengeStyleEnum, challengeFormatSlugEnum, triviaDifficultyEnum, validationStrategyEnum, interactionTypeEnum, aiModelEnum, newsProviderEnum, storyStatusEnum, useCaseAudienceEnum, domainSourceEnum, domainVerificationStatusEnum, pageTypeEnum, systemNotificationTypeEnum, brandPageTypeEnum, brandConfidenceLevelEnum
2. Zod Validation Schemas
File: packages/shared/src/schemas.ts
| Schema | Purpose | Targets |
|---|---|---|
FactKeySchema | Validates fact key definitions (key, label, type, required) | factRecordSchemas.factKeys column |
CardFormatSchema / DifficultySchema | Enum validation for card formats and difficulty | Queue messages, API inputs |
ChallengeStyleSchema / ChallengeFormatSlugSchema | Enum validation for styles and named formats | Challenge generation, session creation |
KnowledgeTypeSchema | fandom, skill, memory, temporal, association, career, visual, creator | challengeFormats.knowledgeType |
ValidationResultSchema | Single-phase validation output (strategy, sources, confidence, flags) | factRecords.validation JSONB column |
MultiPhaseValidationResultSchema | Full pipeline result with phase array | factRecords.validation JSONB column |
FactDetailSchema | Public-facing fact detail with topic, image, validation | Web app card detail API |
CardVariationSchema | Computed card with question, visible/hidden facts, options | Card UI rendering |
ChallengeSessionSchema / ConversationMessageSchema | Challenge game state and dialogue turns | Conversational challenge flow |
Queue Message Schemas:
| Schema | Targets |
|---|---|
ExtractFactsMessageSchema | worker-facts/handlers/extract-facts.ts |
ImportFactsMessageSchema | worker-facts/handlers/import-facts.ts |
ValidateFactMessageSchema | worker-validate/handlers/validate-fact.ts |
GenerateEvergreenMessageSchema | worker-facts/handlers/generate-evergreen.ts |
ExplodeCategoryEntryMessageSchema | worker-facts/handlers/explode-entry.ts |
FindSuperFactsMessageSchema | worker-facts/handlers/find-super-facts.ts |
GenerateChallengeContentMessageSchema | worker-facts/handlers/generate-challenge-content.ts |
ResolveImageMessageSchema | worker-ingest/handlers/resolve-image.ts |
IngestNewsMessageSchema | worker-ingest/handlers/ingest-news.ts |
ClusterStoriesMessageSchema | worker-ingest/handlers/cluster-stories.ts |
ResolveChallengeImageMessageSchema | worker-ingest/handlers/resolve-challenge-image.ts |
3. AI Type Definitions
Fact Engine
File: packages/ai/src/fact-engine.ts
| Type / Function | Purpose | Targets |
|---|---|---|
FactEngineTask | Union of all AI task names (14 tasks) | Model routing via selectModelForTask() |
ExtractFactsInput / ExtractFactsResult | News story to structured facts | worker-facts/handlers/extract-facts.ts |
ScoreNotabilityInput / ScoreNotabilityResult | Notability scoring (0-1) | Fact insertion threshold (>= 0.6) |
ValidateFactInput / ValidateFactResult | AI cross-check validation | worker-validate/handlers/validate-fact.ts |
GenerateEvergreenInput / GenerateEvergreenResult | Evergreen fact generation | worker-facts/handlers/generate-evergreen.ts |
extractFactsFromStory() | Orchestrates extraction with tone injection | Called by extract-facts handler |
selectModelForTask() | Tier-based model routing | All AI calls |
Seed Explosion
File: packages/ai/src/seed-explosion.ts
| Type / Function | Purpose | Targets |
|---|---|---|
ExplodeCategoryEntryInput / ExplodeCategoryEntryResult | Seed entry to many facts + spinoffs | worker-facts/handlers/explode-entry.ts |
SuperFactCandidate / FindSuperFactsResult | Cross-entry correlation discovery | worker-facts/handlers/find-super-facts.ts |
Schema Utilities
File: packages/ai/src/schema-utils.ts
| Function | Purpose | Targets |
|---|---|---|
buildFactObjectSchema() | Converts fact_record_schemas.fact_keys to concrete z.object() | AI extraction prompts, validation |
Model Adapters
File: packages/ai/src/models/types.ts
| Type | Purpose | Targets |
|---|---|---|
ModelAdapter | Model-specific prompt customization interface | Per-model tone adjustments in generation |
PromptCustomization | systemPromptPrefix/Suffix/Override, temperature | applyPromptCustomization() in fact-engine.ts |
4. Validation Pipeline
| File | Phase | Cost | Targets |
|---|---|---|---|
packages/ai/src/validation/structural.ts | Phase 1: Required keys, types, injection detection | $0 | All new fact records |
packages/ai/src/validation/consistency.ts | Phase 2: Numeric ordering, temporal logic, coherence | $0 | All facts passing Phase 1 |
packages/ai/src/validation/cross-model.ts | Phase 3: Adversarial AI verification (Gemini 2.5 Flash) | ~$0.001 | Facts needing higher confidence |
packages/ai/src/validation/evidence.ts | Phase 4: API + reasoner evidence corroboration | ~$0.002-0.005 | Facts needing external verification |
5. Voice & Tone Constants
Challenge Content Rules
File: packages/ai/src/challenge-content-rules.ts
| Constant | Purpose | Targets |
|---|---|---|
CHALLENGE_TONE_PREFIX | Universal 3-line Eko voice identity | Injected into all user-facing AI prompts (fact-engine.ts, challenge-content.ts) |
CHALLENGE_VOICE_CONSTITUTION | 70+ line detailed voice principles, forbidden patterns, anti-pattern words | buildSystemPrompt() in challenge-content.ts |
STYLE_RULES | Per-style structural requirements (6 styles x 5 field rules each) | Challenge content generation prompts |
STYLE_VOICE | Per-style personality definitions (personality, register, phrasing examples, reveal tone) | Challenge content generation prompts |
FORMAT_VOICE | Per-format personality definitions (8 formats: personality, register, energy, excitement driver, reveal tone, emphasis) | Challenge content generation prompts (layer 3.7) |
FORMAT_RULES | Per-format structural refinements (8 formats: setup, challenge, reveal, correct_answer, style_data) | Challenge content generation prompts (layer 3.7) |
DIFFICULTY_LEVELS | Per-difficulty level definitions (5 levels with cognitive demand descriptors) | Difficulty guidance in generation prompts |
SUPER_FACT_RULES | Guidance for cross-entry super-fact challenge content | Super-fact challenge generation |
PRE_GENERATED_STYLES | List of 6 styles pre-generated at content creation time | Challenge content batch generation |
BANNED_PATTERNS | Regex list for anti-pattern word detection | validateChallengeContent() |
CQ002_REGEX | Second-person enforcement regex | patchCq002() post-processing |
validateChallengeContent() | Enforces all CQ-001 through CQ-013 rules | Generation-time + upload-time validation |
patchCq002() | Post-generation second-person pronoun enforcement | Challenge content post-processing |
Taxonomy Content Rules
File: packages/ai/src/taxonomy-content-rules.ts
| Constant | Purpose | Targets |
|---|---|---|
TAXONOMY_CONTENT_RULES | Per-taxonomy extraction guidance, challenge guidance, anti-patterns, domain vocabulary (32 taxonomies) | AI extraction prompts in extractFactsFromStory(), seed explosion, evergreen generation |
TAXONOMY_VOICE | Per-taxonomy emotional register, energy, excitement driver, pitfalls | Injected into challenge-content.ts buildSystemPrompt(), seed explosion, evergreen generation |
formatVocabularyForPrompt() | Serializes TaxonomyVocabulary into concise prompt block (~120-200 tokens) | All 4 prompt injection sites (DRY helper) |
6. Rules Documentation
| File | Purpose | Targets |
|---|---|---|
docs/marketing/CHALLENGE_TONE.md | Canonical voice specification (source of truth) | All voice constants in code |
docs/rules/challenge-content.md | System rules CC-001 through CC-011, quality rules CQ-001 through CQ-013 | validateChallengeContent(), AI prompt instructions |
docs/rules/README.md | Full rules index with enforcement and ownership | Developer reference |
7. Database Query Functions
File: packages/db/src/drizzle/fact-engine-queries.ts
| Function | Purpose | Targets |
|---|---|---|
getTopicCategoryBySlug() / getTopicCategoryById() | Topic lookup | Feed, extraction routing |
resolveTopicCategory() | Alias-aware category resolution with inactive-category fallback via resolveActiveCategory() | Ingestion pipeline (story to topic) |
resolveActiveCategory() | Follows deprecation replacement chains (max 5 hops) to find the current active successor | Inactive-category fallback in resolveTopicCategory() |
getDescendantCategoriesForParent() | Recursive 3-level descendant loading for a parent category | Taxonomy tree views, admin dashboard |
getActiveTopicCategoriesWithSchemas() | Taxonomy tree with schemas; supports maxDepth filtering | Admin dashboard, evergreen generation |
getFactRecordSchemaByTopicId() | Schema lookup for a topic | AI extraction (build dynamic Zod schema) |
insertFactRecord() | Create fact record | Extract + import handlers |
updateFactRecordStatus() | Set validated/rejected with validation JSONB | Validation handler |
getPublishedFacts() / getFactsByTopic() | Paginated feed queries | Web app feed API |
getFactDetail() | Full fact with relations | Card detail page |
getTopicFactCountToday() | Quota monitoring | Topic-quotas cron route |
archiveExpiredFactRecords() | 30-day news fact expiry | Archival cron |
promoteHighEngagementFacts() | Promote popular facts by engagement | Promotion cron |
recordInteraction() / getUserInteractions() | Card interaction tracking | Spaced repetition, engagement metrics |
getLatestAnswerInteraction() / getReviewDueFacts() | Spaced repetition scheduling | Card review flow |
addBookmark() / removeBookmark() / getUserBookmarks() | User bookmark management | Card detail, bookmarks page |
createIngestionRun() / updateIngestionRun() | Pipeline observability | Ingestion cron routes |
createStory() / updateStory() / getStoryWithSources() | Story lifecycle | Ingestion pipeline |
insertNewsSources() / getNewsSourcesByIds() | News source management | Ingestion pipeline |
findStoriesByTitleSimilarity() | Story deduplication | Clustering handler |
getExistingTitlesForTopic() | Fact deduplication | Extract + import handlers |
getUnclusteredNewsSources() | Unclustered article discovery | Clustering cron |
getStuckPendingValidations() | Stuck validation recovery | Validation monitor cron |
createChallengeSession() / getChallengeSession() / updateChallengeSession() | Challenge session lifecycle | Conversational challenge flow |
getActiveChallengeFormats() / getChallengeFormatBySlug() | Format lookup | Challenge session creation |
getFormatsForTopic() / getPairedFormat() | Topic-scoped format selection | Challenge format eligibility |
getActiveSessionCount() / abandonStaleSessions() | Session rate limiting and cleanup | Challenge session management |
createScoreDispute() / getScoreDispute() / getUserMonthlyDisputeCount() | Score dispute lifecycle | Dispute flow, rate limiting |
updateInteractionScore() | Score adjustment post-dispute | Dispute resolution |
getAvailableMilestones() / claimMilestone() | Reward milestone system | Gamification |
getUserFreeTextCountToday() | Free-text answer rate limiting | Subscription gating |
createTrialSubscription() / getPlanTrialDays() | Trial subscription management | Onboarding |
getUserCumulativeScore() | Cumulative score aggregation | Profile, leaderboard |
8. Worker Handlers
| File | Queue Message | Consumes From |
|---|---|---|
apps/worker-facts/src/handlers/extract-facts.ts | EXTRACT_FACTS | fact-engine.ts extractFactsFromStory() |
apps/worker-facts/src/handlers/generate-evergreen.ts | GENERATE_EVERGREEN | fact-engine.ts generateEvergreenFacts() |
apps/worker-facts/src/handlers/import-facts.ts | IMPORT_FACTS | Bulk structured imports (ESPN, WikiQuote, etc.) |
apps/worker-facts/src/handlers/explode-entry.ts | EXPLODE_CATEGORY_ENTRY | seed-explosion.ts explodeCategoryEntry() |
apps/worker-facts/src/handlers/find-super-facts.ts | FIND_SUPER_FACTS | seed-explosion.ts findSuperFacts() |
apps/worker-facts/src/handlers/generate-challenge-content.ts | GENERATE_CHALLENGE_CONTENT | challenge-content.ts |
apps/worker-validate/src/handlers/validate-fact.ts | VALIDATE_FACT | 4-phase validation pipeline |
apps/worker-ingest/src/handlers/ingest-news.ts | INGEST_NEWS | News API provider fetch |
apps/worker-ingest/src/handlers/cluster-stories.ts | CLUSTER_STORIES | Story deduplication and clustering |
apps/worker-ingest/src/handlers/resolve-image.ts | RESOLVE_IMAGE | Image resolution cascade (Wikipedia, SportsDB, Unsplash, Pexels) |
apps/worker-ingest/src/handlers/resolve-challenge-image.ts | RESOLVE_CHALLENGE_IMAGE | Anti-spoiler image resolution for challenges where the fact image reveals the answer |
Prompt Layering Hierarchy
The order voice controls are injected into AI generation prompts:
1. CHALLENGE_TONE_PREFIX <- Universal Eko identity (3 lines)
2. CHALLENGE_VOICE_CONSTITUTION <- Detailed principles & forbidden patterns
3. TAXONOMY_VOICE[slug] <- Domain register & energy (e.g. "Nature documentary narrator")
3.5 DOMAIN_VOCABULARY[slug] <- Domain terminology & expert phrasing (via formatVocabularyForPrompt)
3.7 FORMAT_VOICE[formatSlug] <- Format posture & energy (e.g. "enthusiastic superfan")
FORMAT_RULES[formatSlug] <- Format structural refinements (setup/challenge/reveal/answer)
4. STYLE_VOICE[style] <- Interaction mechanics (e.g. "curious gallery guide")
5. STYLE_RULES[style] <- Structural field requirements
6. TAXONOMY_CONTENT_RULES[slug] <- Domain extraction/challenge guidance
7. Difficulty guidance <- Per-level prompt adjustments
8. Super-fact rules (if applicable)
9. Generation instructions
10. ModelAdapter customization <- Per-model prefix/suffix/temperature
Each layer adds specificity without overriding the previous one. The universal tone sets the floor, taxonomy voice sets the domain energy, domain vocabulary teaches expert-level language, format voice sets the challenge format posture, and style voice sets the interaction mechanics.
Vocabulary is auto-generated when adding new categories via bun run taxonomy:add (see generateVocabulary() in scripts/taxonomy/add-taxonomy.ts).