Mockup Seeded/Evergreen Challenges — End-to-End Pipeline Reference
Generated: 2026-03-04T14:45:00Z
Model: gemini-3-flash-preview
Pipeline: Entity Explosion / Evergreen Gen → IMPORT_FACTS → VALIDATE_FACT (multi_phase, 4 phases) → GENERATE_CHALLENGE_CONTENT
Validation strategy: multi_phase — progressive 4-phase pipeline (structural → consistency → cross-model AI → evidence corroboration)
Facts: 5 validated + 1 rejected | Challenges: 5 generated (1 per fact)
Entities: Marie Curie (2), The Great Wall of China (2), Octopus (1)
Sample 1: Marie Curie — Radioactivity Discovery (Evergreen)
Fact Record
| Field | Value |
|---|---|
| ID | e8b2c3d4-f5a6-7890-bcde-200000000001 |
| Title | Marie Curie's Discovery of Polonium and Radium Earned Two Nobel Prizes |
| Challenge Title | The Scientist Who Glowed in the Dark — And Changed the World Twice |
| Notability | 0.98 |
| Taxonomy | science/physics |
| Schema | science_fact |
| Source Type | ai_generated (evergreen) |
| Source Story ID | null |
| External Source ID | null |
| AI Model | gemini-3-flash-preview |
| Generation Cost | $0.004200 |
| Content Hash | sha256:11a2b3c4d5e6f7a8b9c0d1e2f3a4b5c6d7e8f9a0b1c2d3e4f5a6b7c8d9e0f1a2 |
| Status | validated |
| Expires At | null (permanent) |
Fact Values
| Key | Value |
|---|---|
| subject | Discovery of Polonium and Radium |
| discovery | Isolated two new radioactive elements (polonium and radium) from pitchblende ore |
| scientist | Marie Curie (born Maria Sklodowska) |
| year | 1898 (discovery), 1903 (Physics Nobel), 1911 (Chemistry Nobel) |
| field | Nuclear Physics / Radiochemistry |
| method | Fractional crystallization of pitchblende ore residues, measured by electrometer |
| impact | Founded the entire field of radioactivity research and pioneered radiation therapy for cancer treatment |
| surprising_detail | Curie's laboratory notebooks from the 1890s remain so radioactive they must be stored in lead-lined boxes and require protective gear to handle |
| measurement | Radium's radioactivity measured at 3 million times stronger than uranium by weight |
Fact Context
Marie Curie did not just discover radioactivity — she coined the very word and then spent years hunched over boiling vats of pitchblende ore to prove it existed in elemental form. Working alongside her husband Pierre in a converted shed with no proper ventilation, she isolated polonium (named for her homeland Poland) and radium through thousands of painstaking fractional crystallizations. Her 1903 Physics Nobel made her the first woman to win the prize, and her 1911 Chemistry Nobel made her the first person of any gender to win two. The notebooks she filled with experimental data during those years remain so contaminated with radium-226 that they sit in lead-lined boxes at France's Bibliothèque nationale. Researchers who wish to consult them must sign a liability waiver and wear protective clothing. Curie's work laid the foundation for radiation therapy, nuclear energy, and our fundamental understanding of atomic structure.
Validation — multi_phase (4 phases)
| Phase | Name | Passed | Confidence | Flags | Cost | Duration |
|---|---|---|---|---|---|---|
| 1 | Structural | YES | 1.00 | [] | $0.000 | 45ms |
| 2 | Internal Consistency | YES | 0.95 | [] | $0.000 | 62ms |
| 3 | Cross-Model AI (gemini-2.5-flash) | YES | 0.85 | [] | $0.001 | 1,240ms |
| Field | Value |
|---|---|
| Phase 1 Detail | All required fact keys present (subject, discovery, field, impact). Title ≥ 10 chars. No injection patterns. Context ≥ 100 chars. |
| Phase 2 Detail | Year ordering valid (1898 → 1903 → 1911). Scientist matches known taxonomy. Field aligns with science/physics category. |
| Phase 3 Detail | Adversarial model (gemini-2.5-flash) confirms: polonium and radium discovered 1898, Physics Nobel 1903, Chemistry Nobel 1911, notebooks stored in lead-lined boxes at BnF. All claims corroborated. |
| Final Confidence | 0.85 |
| Total Cost | $0.001 |
Challenge Content (Difficulty 1 — Multiple Choice)
- Challenge Title: The Scientist Who Glowed in the Dark — And Changed the World Twice
- Challenge Context: One scientist's obsessive work with boiling ore in a poorly ventilated shed led to not one but two Nobel Prizes and founded an entirely new branch of physics. Her laboratory notebooks from the 1890s are still too radioactive to touch without protective equipment more than a century later.
- Setup: In the late 1890s, a determined researcher spent years processing tons of pitchblende ore in a converted shed, hunting for elements no one had ever isolated before. The work earned this scientist two Nobel Prizes in two different scientific disciplines — a feat no one had accomplished before and only a handful have matched since.
- Challenge Text: Given those two groundbreaking Nobel Prizes in Physics and Chemistry, can you identify the scientist who isolated polonium and radium?
- Style Data:
{ "options": [ { "text": "Marie Curie", "is_correct": true }, { "text": "Lise Meitner", "is_correct": false }, { "text": "Rosalind Franklin", "is_correct": false }, { "text": "Dorothy Hodgkin", "is_correct": false } ] } - Reveal (Correct): Marie Curie sits at the center of a story that keeps surprising — She named polonium after her homeland Poland and spent years in a cramped shed to prove that radioactivity came from the atom itself, not from chemical interactions.
- Reveal (Wrong): The scientist behind both discoveries is Marie Curie, who became the first woman to win a Nobel Prize in 1903 and then the first person ever to win two Nobels when she claimed the Chemistry prize in 1911.
- Correct Answer: The scientist who isolated polonium and radium is Marie Curie, born Maria Sklodowska in Warsaw, Poland. She and her husband Pierre processed tons of pitchblende ore through thousands of fractional crystallizations in a converted shed with no proper ventilation, ultimately proving that radioactivity was an atomic property rather than a chemical one. Her 1903 Physics Nobel made her the first woman to win the prize, and her 1911 Chemistry Nobel made her the first person to win two — a distinction that stood alone for decades. Perhaps the most haunting detail of her legacy is that her laboratory notebooks remain so contaminated with radium-226 that they are stored in lead-lined boxes and require protective gear to handle, over 125 years later.
Quality Annotations
| Gate | Status | Detail |
|---|---|---|
| CQ-001 | PASS | Difficulty 1 in range 1–5 |
| CQ-002 | PASS | Challenge text contains "can you identify" |
| CQ-003 | PASS | challenge_text = 112 chars (≥30) |
| CQ-004 | PASS | All min lengths met: title=59, setup=289, challenge=112, reveal_correct=195, reveal_wrong=194, correct_answer=570 |
| CQ-005 | PASS | 4 options, exactly 1 correct (Marie Curie) |
| CQ-006 | PASS | No banned patterns detected |
| CQ-008 | PASS | correct_answer = 570 chars, 4 sentences, narrative arc |
| Title | PASS | 59 chars, no spoilers (does not name Marie Curie), no banned patterns |
| Context | PASS | 273 chars, no answer leak, theatrical voice |
| Voice | PASS | Active voice throughout, conversational register |
| Patch | PASS | No passive constructions or textbook register detected |
Sample 2: Marie Curie — Radioactive Notebooks (Evergreen)
Fact Record
| Field | Value |
|---|---|
| ID | e8b2c3d4-f5a6-7890-bcde-200000000002 |
| Title | Marie Curie's Century-Old Notebooks Remain Dangerously Radioactive |
| Challenge Title | The Papers That Still Burn After a Century of Silence |
| Notability | 0.91 |
| Taxonomy | science/physics |
| Schema | science_fact |
| Source Type | ai_generated (evergreen) |
| Source Story ID | null |
| External Source ID | null |
| AI Model | gemini-3-flash-preview |
| Generation Cost | $0.004100 |
| Content Hash | sha256:22b3c4d5e6f7a8b9c0d1e2f3a4b5c6d7e8f9a0b1c2d3e4f5a6b7c8d9e0f1a2b3 |
| Status | validated |
| Expires At | null (permanent) |
Fact Values
| Key | Value |
|---|---|
| subject | Radioactive contamination of Marie Curie's personal effects |
| discovery | Curie's notebooks, clothing, and even her cookbook are contaminated with radium-226 (half-life: 1,600 years) |
| scientist | Marie Curie |
| year | 1890s–1934 (working period) |
| field | Radiochemistry / Science History |
| method | Prolonged unshielded handling of radium during experiments and daily life |
| impact | Demonstrates the long-term dangers of radiation exposure and serves as a cautionary artifact for nuclear safety |
| surprising_detail | Visitors to the Bibliothèque nationale de France must sign a liability disclaimer and wear protective clothing to view the notebooks |
| measurement | Radium-226 half-life of 1,600 years means the contamination will persist until approximately 3500 CE |
Fact Context
Marie Curie's personal notebooks sit in lead-lined boxes deep inside France's national library, and they will stay dangerous for centuries. The radium-226 that saturated the pages during her decades of unshielded laboratory work has a half-life of 1,600 years, meaning the contamination will not decay to safe levels until roughly 3500 CE. Anyone wishing to consult these historic documents must sign a liability waiver and don protective clothing before handling them. Even Curie's cookbook and personal belongings carry measurable radioactivity. These artifacts serve as both priceless scientific records and haunting reminders of the price Curie paid for her discoveries — she died in 1934 from aplastic anemia almost certainly caused by chronic radiation exposure.
Validation — multi_phase (4 phases)
| Phase | Name | Passed | Confidence | Flags | Cost | Duration |
|---|---|---|---|---|---|---|
| 1 | Structural | YES | 1.00 | [] | $0.000 | 38ms |
| 2 | Internal Consistency | YES | 0.95 | [] | $0.000 | 55ms |
| 3 | Cross-Model AI (gemini-2.5-flash) | YES | 0.82 | [] | $0.001 | 1,180ms |
| Field | Value |
|---|---|
| Phase 1 Detail | Required keys present. Title ≥ 10 chars. No injection patterns. Context ≥ 100 chars. |
| Phase 2 Detail | Year range valid (1890s–1934). Half-life claim consistent with radium-226 physics. |
| Phase 3 Detail | Adversarial model confirms: notebooks stored at BnF in lead-lined boxes, liability waiver required, radium-226 half-life 1,600 years, Curie died 1934 of aplastic anemia. All claims verified. |
| Final Confidence | 0.82 |
| Total Cost | $0.001 |
Challenge Content (Difficulty 5 — Free Text)
- Challenge Title: The Papers That Still Burn After a Century of Silence
- Challenge Context: Deep inside France's national library, a set of century-old scientific notebooks sits locked in lead-lined boxes. The documents are so dangerous that anyone who wants to read them must sign a legal waiver and suit up in protective gear — and they will remain hazardous for another fifteen centuries.
- Setup: Imagine scientific documents so significant that a national library preserves them as treasured artifacts — yet so dangerous that researchers must sign liability disclaimers and wear protective clothing just to turn the pages. These particular notebooks have been radioactive since the 1890s and will not become safe to handle with bare hands for roughly another 1,500 years.
- Challenge Text: In your own words, explain why Marie Curie's laboratory notebooks remain dangerously radioactive more than a century after she wrote in them, and describe the precautions required to access them today.
- Style Data:
{ "key_concepts": [ "radium-226 contamination", "half-life of 1,600 years", "lead-lined storage at Bibliothèque nationale de France", "liability waiver and protective clothing required", "unshielded handling during decades of experiments" ], "sample_answer": "Marie Curie's notebooks are still radioactive because she worked with radium-226 without any shielding, and the element has a half-life of 1,600 years. The contamination literally soaked into the paper during her decades of experiments. Today, the notebooks are stored in lead-lined boxes at France's national library, and anyone who wants to study them must sign a liability waiver and wear protective equipment." } - Reveal (Correct): Those notebooks sit at the center of a story that keeps surprising — Radium-226 contaminated every page during Curie's decades of unshielded work, and its 1,600-year half-life means the documents will remain hazardous until approximately 3500 CE.
- Reveal (Wrong): The key reason is radium-226 contamination with its 1,600-year half-life. Curie handled radium without any shielding for decades, and the element saturated her notebooks, clothing, and even her cookbook. Today they sit in lead-lined boxes at France's Bibliothèque nationale, requiring a signed waiver and protective gear to access.
- Correct Answer: Marie Curie's laboratory notebooks remain dangerously radioactive because she worked with radium-226 for decades without any radiation shielding, and the element physically saturated the paper, ink, and bindings of every notebook she used. Radium-226 has a half-life of 1,600 years, which means the contamination will not decay to safe levels until approximately 3500 CE — over fifteen centuries from now. Today, the notebooks are preserved in lead-lined boxes at France's Bibliothèque nationale de France, and any researcher who wishes to consult them must sign a formal liability disclaimer and wear full protective clothing. These haunting artifacts serve as both invaluable scientific records and a sobering reminder that Curie's groundbreaking discoveries came at the ultimate personal cost — she died in 1934 from aplastic anemia almost certainly caused by her chronic radiation exposure.
Quality Annotations
| Gate | Status | Detail |
|---|---|---|
| CQ-001 | PASS | Difficulty 5 in range 1–5 |
| CQ-002 | PASS | Challenge text contains "In your own words" and "your" |
| CQ-003 | PASS | challenge_text = 186 chars (≥30) |
| CQ-004 | PASS | All min lengths met: title=50, setup=302, challenge=186, reveal_correct=196, reveal_wrong=282, correct_answer=622 |
| CQ-011 | PASS | style_data contains key_concepts (5 items, non-empty) + sample_answer |
| CQ-006 | PASS | No banned patterns detected |
| CQ-008 | PASS | correct_answer = 622 chars, 4 sentences, narrative arc |
| Title | PASS | 50 chars, no spoilers, no banned patterns |
| Context | PASS | 276 chars, no answer leak (does not mention radium-226 or half-life), theatrical voice |
| Voice | PASS | Active voice throughout, conversational register |
| Patch | PASS | No passive constructions or textbook register detected |
Sample 3: The Great Wall of China — Construction Scale (File Seed)
Fact Record
| Field | Value |
|---|---|
| ID | e8b2c3d4-f5a6-7890-bcde-200000000003 |
| Title | The Great Wall of China Stretches Over 21,000 Kilometers Across Multiple Dynasties |
| Challenge Title | The Dragon's Spine: Twenty-One Thousand Kilometers of Imperial Ambition |
| Notability | 0.95 |
| Taxonomy | history/ancient |
| Schema | history_fact |
| Source Type | file_seed (entity explosion) |
| Source Story ID | null |
| External Source ID | seed:great-wall-of-china:0 |
| AI Model | gemini-3-flash-preview |
| Generation Cost | $0.003800 |
| Content Hash | sha256:33c4d5e6f7a8b9c0d1e2f3a4b5c6d7e8f9a0b1c2d3e4f5a6b7c8d9e0f1a2b3c4 |
| Status | validated |
| Expires At | null (permanent) |
Fact Values
| Key | Value |
|---|---|
| subject | Total length measurement of the Great Wall of China |
| era | Ancient through Imperial (7th century BCE – 17th century CE) |
| year | 2012 (official survey completed by State Administration of Cultural Heritage) |
| location | Northern China, spanning 15 provinces and autonomous regions |
| key_figures | Emperor Qin Shi Huang (first unifier), Ming dynasty builders (most iconic sections) |
| cause | Defense against northern nomadic invasions (Xiongnu, Mongols, Manchus) |
| outcome | 21,196.18 km total length confirmed by archaeological survey, making it the longest structure ever built |
| misconception | The Wall is not a single continuous structure — it consists of overlapping segments built by different dynasties over 2,000+ years |
| surprising_detail | The most famous tourist sections near Beijing represent less than 5% of the total Wall, and large stretches have crumbled to unrecognizable earthen mounds |
Fact Context
The Great Wall of China stretches 21,196.18 kilometers across northern China — a distance greater than half the circumference of the Earth. A 2012 archaeological survey by China's State Administration of Cultural Heritage finally settled the question of its true length, counting every overlapping segment built by successive dynasties over more than two millennia. The Wall is not one continuous structure but a patchwork of fortifications, watchtowers, and earthen ramparts constructed by the Qin, Han, Ming, and other dynasties to defend against northern nomadic invasions. The iconic stone sections near Beijing that most tourists visit represent less than 5% of the total length. Vast stretches in remote provinces have eroded to low earthen ridges that are barely recognizable as part of the same monument. This sprawling network employed millions of workers across centuries and remains the longest structure ever built by human hands.
Validation — multi_phase (4 phases, including Phase 4)
| Phase | Name | Passed | Confidence | Flags | Cost | Duration |
|---|---|---|---|---|---|---|
| 1 | Structural | YES | 1.00 | [] | $0.000 | 42ms |
| 2 | Internal Consistency | YES | 0.95 | [] | $0.000 | 58ms |
| 3 | Cross-Model AI (gemini-2.5-flash) | YES | 0.83 | [] | $0.001 | 1,310ms |
| 4 | Evidence Corroboration (Wikipedia + AI reasoner) | YES | 0.90 | [] | $0.003 | 2,450ms |
| Field | Value |
|---|---|
| Phase 1 Detail | Required keys present (subject, era, outcome). Title ≥ 10 chars. No injection patterns. Context ≥ 100 chars. |
| Phase 2 Detail | Year ordering valid (7th c. BCE → 2012 survey). Location consistent with history/ancient taxonomy. |
| Phase 3 Detail | Adversarial model confirms: 21,196.18 km length, 2012 survey by SACH, multiple dynasties, not continuous. All claims verified. |
| Phase 4 Detail | Wikipedia article "Great Wall of China" confirms 21,196.18 km (2012 survey). Wikidata entity Q12501 confirms location, construction period, purpose. AI reasoner cross-checks: <5% tourist-accessible claim consistent with sources citing ~8% in "reasonable" condition. |
| Final Confidence | 0.90 |
| Total Cost | $0.004 |
Spinoff Candidates (from entity explosion of "The Great Wall of China"):
- Great Wall watchtower communication system (fire signals)
- Forced labor and mortality during Qin dynasty construction
- The Wall's role in Silk Road trade route protection
- Ming dynasty renovation and the "Nine Frontier Districts"
Challenge Content (Difficulty 2 — Fill the Gap)
- Challenge Title: The Dragon's Spine: Twenty-One Thousand Kilometers of Imperial Ambition
- Challenge Context: A 2012 archaeological survey finally answered one of history's most debated questions — just how long is the Great Wall of China? The answer stunned historians because the official figure dwarfed every previous estimate, revealing that the Wall spans a distance greater than half the circumference of Earth.
- Setup: For centuries, historians argued about the true length of the Great Wall of China. Some estimated 5,000 kilometers, others guessed closer to 10,000. Then in 2012, China's State Administration of Cultural Heritage completed a comprehensive archaeological survey that counted every segment built by every dynasty — and the final number silenced the debate forever.
- Challenge Text: Can you fill in the precise figure? The 2012 survey determined the total length of the Great Wall of China to be ___ kilometers.
- Style Data:
{ "complete_text": "The 2012 survey determined the total length of the Great Wall of China to be 21,196.18 kilometers.", "answer": "21,196.18", "acceptable_answers": ["21,196.18", "21196.18", "21,196", "21196", "over 21,000", "about 21,000"] } - Reveal (Correct): That 21,196.18 kilometer figure sits at the center of a story that keeps surprising — The Wall spans 15 provinces and represents more than 2,000 years of construction by successive dynasties, yet the famous tourist sections near Beijing account for less than 5% of the total.
- Reveal (Wrong): The answer is 21,196.18 kilometers — a distance greater than half the Earth's circumference. China's 2012 archaeological survey counted every segment from every dynasty, revealing a patchwork of fortifications far longer than any previous estimate suggested.
- Correct Answer: The 2012 survey measured the Great Wall at 21,196.18 kilometers, making it the longest structure ever built by human hands and more than half the circumference of the Earth. China's State Administration of Cultural Heritage spent years cataloging every segment constructed by the Qin, Han, Ming, and other dynasties across 15 provinces and autonomous regions. The survey revealed that the Wall is not a single continuous structure but an overlapping patchwork of stone fortifications, earthen ramparts, and watchtowers assembled over more than two millennia. Most visitors never learn that the iconic stone sections near Beijing represent less than 5% of the total — vast stretches in remote provinces have eroded to low mounds barely recognizable as part of the same monument.
Quality Annotations
| Gate | Status | Detail |
|---|---|---|
| CQ-001 | PASS | Difficulty 2 in range 1–5 |
| CQ-002 | PASS | Challenge text contains "Can you fill in" |
| CQ-003 | PASS | challenge_text = 107 chars (≥30) |
| CQ-004 | PASS | All min lengths met: title=62, setup=316, challenge=107, reveal_correct=222, reveal_wrong=213, correct_answer=573 |
| CQ-007 | PASS | style_data contains complete_text + answer |
| CQ-006 | PASS | No banned patterns detected |
| CQ-008 | PASS | correct_answer = 573 chars, 4 sentences, narrative arc |
| Title | PASS | 62 chars, no spoilers, no banned patterns |
| Context | PASS | 272 chars, no answer leak (does not include "21,196"), theatrical voice |
| Voice | PASS | Active voice throughout, conversational register |
| Patch | PASS | No passive constructions or textbook register detected |
Sample 4: The Great Wall of China — Watchtower Signal System (File Seed)
Fact Record
| Field | Value |
|---|---|
| ID | e8b2c3d4-f5a6-7890-bcde-200000000004 |
| Title | Great Wall Watchtowers Used Fire and Smoke Signals to Relay Messages Across Hundreds of Kilometers |
| Challenge Title | Flames on the Frontier: The Ancient Internet That Guarded an Empire |
| Notability | 0.88 |
| Taxonomy | history/ancient |
| Schema | history_fact |
| Source Type | file_seed (entity explosion) |
| Source Story ID | null |
| External Source ID | seed:great-wall-of-china:1 |
| AI Model | gemini-3-flash-preview |
| Generation Cost | $0.003600 |
| Content Hash | sha256:44d5e6f7a8b9c0d1e2f3a4b5c6d7e8f9a0b1c2d3e4f5a6b7c8d9e0f1a2b3c4d5 |
| Status | validated |
| Expires At | null (permanent) |
Fact Values
| Key | Value |
|---|---|
| subject | Great Wall watchtower fire signal communication system |
| era | Imperial China (primarily Ming dynasty, 1368–1644 CE) |
| year | Standardized during Ming dynasty reforms (c. 1400s) |
| location | Along the Great Wall's northern frontier, particularly the Nine Frontier Districts |
| key_figures | Ming general Qi Jiguang (reformed the signal system in the 1560s) |
| cause | Need for rapid long-distance military communication along a 21,000+ km frontier |
| outcome | Messages could travel from the frontier to Beijing (approx. 500 km) in under 12 hours using relay fires |
| misconception | The system did not use a single type of signal — different combinations of fires and smoke columns encoded specific intelligence about enemy force size and direction |
| surprising_detail | Wolf dung was the preferred fuel for smoke signals because it produced thick, visible columns that resisted wind dispersal better than wood smoke |
Fact Context
The Great Wall's watchtowers formed one of the most sophisticated communication networks in the ancient world, relaying coded fire and smoke signals across hundreds of kilometers in a matter of hours. During the Ming dynasty, General Qi Jiguang standardized the system so that different combinations of fires and smoke columns encoded specific military intelligence — one fire meant a small raiding party, three fires meant a major invasion force. A message about an approaching army could travel from the northern frontier to Beijing, roughly 500 kilometers away, in under 12 hours. The preferred fuel for these signals was wolf dung, which produced exceptionally thick smoke columns that held their shape in strong winds far better than wood smoke. Thousands of soldiers staffed these watchtowers in rotating shifts, scanning the horizon around the clock for signs of nomadic cavalry.
Validation — multi_phase (4 phases, including Phase 4)
| Phase | Name | Passed | Confidence | Flags | Cost | Duration |
|---|---|---|---|---|---|---|
| 1 | Structural | YES | 1.00 | [] | $0.000 | 40ms |
| 2 | Internal Consistency | YES | 0.95 | [] | $0.000 | 52ms |
| 3 | Cross-Model AI (gemini-2.5-flash) | YES | 0.80 | [] | $0.001 | 1,290ms |
| 4 | Evidence Corroboration (Wikipedia + AI reasoner) | YES | 0.85 | [] | $0.003 | 2,380ms |
| Field | Value |
|---|---|
| Phase 1 Detail | Required keys present (subject, era, outcome). Title ≥ 10 chars. No injection patterns. Context ≥ 100 chars. |
| Phase 2 Detail | Year ordering valid (Ming dynasty 1368–1644, Qi Jiguang active 1560s). Location consistent with history/ancient taxonomy. |
| Phase 3 Detail | Adversarial model confirms: fire/smoke signal relay system, Qi Jiguang's reforms, coded signal combinations. Minor note: "under 12 hours" for 500km relay is plausible but exact timing varies by source. |
| Phase 4 Detail | Wikipedia "Beacon tower" and "Great Wall of China" articles confirm fire signal system. Wikidata entity for Qi Jiguang confirms military role. AI reasoner notes: wolf dung as fuel is documented in Chinese historical texts (Wujing Zongyao). 500km/12hr relay speed consistent with documented beacon spacing of 5–10 km. |
| Final Confidence | 0.85 |
| Total Cost | $0.004 |
Challenge Content (Difficulty 4 — Reverse Lookup)
- Challenge Title: Flames on the Frontier: The Ancient Internet That Guarded an Empire
- Challenge Context: Centuries before the telegraph, the builders of the Great Wall created a coded communication network that could relay military intelligence across 500 kilometers in under half a day. The system used an unusual fuel source chosen specifically because its smoke refused to scatter in the wind.
- Setup: Long before electricity or radio, ancient engineers built a communication system that could transmit coded military intelligence from a remote frontier to a capital city hundreds of kilometers away in under twelve hours. The system used combinations of fires and smoke columns to encode specific details about approaching enemy forces — their size, direction, and speed. One particular fuel source was prized above all others because it produced smoke columns so dense and stable they held their shape even in strong frontier winds.
- Challenge Text: You have been given the clues — a coded fire relay system on a famous ancient fortification, a Ming dynasty general who standardized it, and smoke signals fueled by an unusual animal product. Can you identify the specific fuel that made these smoke signals so effective?
- Style Data:
{ "answer": "Wolf dung" } - Reveal (Correct): Wolf dung sits at the center of a story that keeps surprising — Ming dynasty soldiers burned it specifically because its thick smoke held together in the fierce winds of northern China, allowing coded signals to travel relay-to-relay across 500 kilometers to Beijing.
- Reveal (Wrong): The unusual fuel is wolf dung, prized by watchtower soldiers because it produced exceptionally thick, wind-resistant smoke columns. General Qi Jiguang standardized this system in the 1560s so that different fire-and-smoke combinations encoded specific intelligence about enemy forces.
- Correct Answer: The preferred fuel for the Great Wall's smoke signals was wolf dung, chosen because it burned to produce smoke columns so thick and dense they resisted wind dispersal far better than wood or any other available material. General Qi Jiguang standardized this signal system during the 1560s, establishing coded combinations where different numbers of fires and smoke columns communicated specific intelligence about approaching enemy cavalry. A single relay chain could carry a warning from the northern frontier to Beijing — roughly 500 kilometers — in under twelve hours, making it one of the fastest long-distance communication systems in the ancient world. The watchtower network was essentially a pre-electric internet, and wolf dung was the bandwidth that made it work.
Quality Annotations
| Gate | Status | Detail |
|---|---|---|
| CQ-001 | PASS | Difficulty 4 in range 1–5 |
| CQ-002 | PASS | Challenge text contains "You have been given" and "Can you identify" |
| CQ-003 | PASS | challenge_text = 226 chars (≥30) |
| CQ-004 | PASS | All min lengths met: title=61, setup=437, challenge=226, reveal_correct=206, reveal_wrong=220, correct_answer=576 |
| CQ-010 | PASS | style_data contains answer field |
| CQ-006 | PASS | No banned patterns detected |
| CQ-008 | PASS | correct_answer = 576 chars, 4 sentences, narrative arc |
| Title | PASS | 61 chars, no spoilers (does not mention wolf dung), no banned patterns |
| Context | PASS | 252 chars, no answer leak, theatrical voice |
| Voice | PASS | Active voice throughout, conversational register |
| Patch | PASS | No passive constructions or textbook register detected |
Sample 5: Octopus — Triple Heart System (Evergreen)
Fact Record
| Field | Value |
|---|---|
| ID | e8b2c3d4-f5a6-7890-bcde-200000000005 |
| Title | The Octopus Pumps Blue Blood Through Three Separate Hearts |
| Challenge Title | Three Chambers for a Deep-Sea Survivor's Circulatory Secret |
| Notability | 0.93 |
| Taxonomy | animals |
| Schema | animal_fact |
| Source Type | ai_generated (evergreen) |
| Source Story ID | null |
| External Source ID | null |
| AI Model | gemini-3-flash-preview |
| Generation Cost | $0.003900 |
| Content Hash | sha256:55e6f7a8b9c0d1e2f3a4b5c6d7e8f9a0b1c2d3e4f5a6b7c8d9e0f1a2b3c4d5e6 |
| Status | validated |
| Expires At | null (permanent) |
Fact Values
| Key | Value |
|---|---|
| animal | Octopus (Order Octopoda) |
| species_group | Cephalopod |
| habitat | Oceans worldwide, from shallow reefs to deep-sea trenches |
| diet | Carnivore — crabs, clams, small fish |
| lifespan_years | 1–5 (varies by species) |
| population_status | Stable (most species) |
| notable_ability | Three-heart circulatory system with copper-based blue blood |
| size_fact | Giant Pacific octopus can span up to 6 meters across |
| fun_fact | The systemic heart stops beating when the octopus swims, which is why they prefer crawling |
Fact Context
The octopus runs its body on three separate hearts and copper-based blue blood — a circulatory system so different from ours it reads like science fiction. Two branchial hearts sit at the base of the gills and pump blood through them to absorb oxygen, while a single systemic heart distributes oxygenated blood to the rest of the body. The blood appears blue because it uses hemocyanin, a copper-based protein, instead of the iron-based hemoglobin that makes mammalian blood red. Hemocyanin is less efficient at carrying oxygen in warm water but excels in the cold, low-oxygen environments where many octopuses hunt. Here is the strangest part: the systemic heart actually stops beating whenever the octopus swims, which explains why these intelligent predators prefer crawling along the seafloor to sprinting through open water.
Validation — multi_phase (3 phases)
| Phase | Name | Passed | Confidence | Flags | Cost | Duration |
|---|---|---|---|---|---|---|
| 1 | Structural | YES | 1.00 | [] | $0.000 | 35ms |
| 2 | Internal Consistency | YES | 0.95 | [] | $0.000 | 48ms |
| 3 | Cross-Model AI (gemini-2.5-flash) | YES | 0.85 | [] | $0.001 | 1,150ms |
| Field | Value |
|---|---|
| Phase 1 Detail | Required keys present (animal, species_group, notable_ability). Title ≥ 10 chars. No injection patterns. Context ≥ 100 chars. |
| Phase 2 Detail | Species group "Cephalopod" consistent with animals taxonomy. Lifespan range 1–5 years valid for Octopoda. |
| Phase 3 Detail | Adversarial model confirms: three hearts (2 branchial + 1 systemic), hemocyanin/copper-based blood, systemic heart stops during swimming, Giant Pacific octopus up to 6m span. All claims verified. |
| Final Confidence | 0.85 |
| Total Cost | $0.001 |
Challenge Content (Difficulty 3 — Direct Question)
- Challenge Title: Three Chambers for a Deep-Sea Survivor's Circulatory Secret
- Challenge Context: While you get by with one heart, a certain ocean predator needs three just to keep its copper-based blue blood flowing. Two of these hearts handle a single specialized task, while the third takes on everything else — and it has a bizarre habit of shutting down at the worst possible moment.
- Setup: Imagine having a circulatory system so demanding that one heart simply cannot handle the workload. Deep in the ocean, one of nature's most intelligent predators evolved a triple-heart system where two specialized hearts handle one critical function while a third manages everything else. The blood running through this system is not even red — it is blue, powered by copper instead of iron.
- Challenge Text: Given that two of the octopus's three hearts are dedicated to a single vital function, do you know what specific role those two specialized hearts perform?
- Style Data:
{ "expected_answer": "Pump blood to the gills", "acceptable_answers": ["pump blood to the gills", "pumping blood to the gills", "gill circulation", "branchial circulation", "send blood through the gills", "push blood to the gills"], "answer_type": "phrase" } - Reveal (Correct): Those branchial hearts sit at the center of a story that keeps surprising — They push blood through the gills to absorb oxygen, and without them the octopus could not survive in the cold, low-oxygen deep-sea environments where many species hunt.
- Reveal (Wrong): The two specialized hearts are branchial hearts, and they pump blood specifically through the gills to collect oxygen. The third heart — the systemic heart — then distributes that oxygenated blue blood to the rest of the body.
- Correct Answer: The two specialized hearts in an octopus are called branchial hearts, and they pump blood through the gills to absorb oxygen from the surrounding water. This arrangement exists because the octopus uses hemocyanin — a copper-based protein — instead of hemoglobin to carry oxygen, and hemocyanin needs more circulatory pressure to work efficiently in cold, low-oxygen environments. The third heart, called the systemic heart, takes over from there and distributes oxygenated blood to the organs and muscles. In one of biology's strangest quirks, the systemic heart actually stops beating whenever the octopus swims, which is exactly why these famously clever predators prefer to crawl along the seafloor rather than sprint through open water.
Quality Annotations
| Gate | Status | Detail |
|---|---|---|
| CQ-001 | PASS | Difficulty 3 in range 1–5 |
| CQ-002 | PASS | Challenge text contains "do you know" |
| CQ-003 | PASS | challenge_text = 123 chars (≥30) |
| CQ-004 | PASS | All min lengths met: title=55, setup=310, challenge=123, reveal_correct=200, reveal_wrong=195, correct_answer=563 |
| CQ-008 | PASS | style_data contains expected_answer + acceptable_answers |
| CQ-006 | PASS | No banned patterns detected |
| CQ-008 | PASS | correct_answer = 563 chars, 4 sentences, narrative arc |
| Title | PASS | 55 chars, no spoilers, no banned patterns |
| Context | PASS | 276 chars, no answer leak (does not mention "gills" or "branchial"), theatrical voice |
| Voice | PASS | Active voice throughout, conversational register |
| Patch | PASS | No passive constructions or textbook register detected |
Sample 6: Marie Curie — Validation Failure (Phase 3 Cross-Model Rejection)
Fact Record
| Field | Value |
|---|---|
| ID | e8b2c3d4-f5a6-7890-bcde-200000000006 |
| Title | Marie Curie Built the First Mobile X-Ray Units That Saved 10 Million Soldiers in WWI |
| Notability | 0.90 |
| Taxonomy | science/physics |
| Schema | science_fact |
| Source Type | ai_generated (evergreen) |
| Source Story ID | null |
| External Source ID | null |
| AI Model | gemini-3-flash-preview |
| Generation Cost | $0.004000 |
| Content Hash | sha256:66f7a8b9c0d1e2f3a4b5c6d7e8f9a0b1c2d3e4f5a6b7c8d9e0f1a2b3c4d5e6f7 |
| Status | rejected |
| Expires At | — |
Fact Values
| Key | Value |
|---|---|
| subject | Marie Curie's mobile X-ray units ("petites Curies") in World War I |
| discovery | Created mobile radiological vehicles that brought X-ray capability to battlefield hospitals |
| scientist | Marie Curie |
| year | 1914–1918 |
| field | Medical Physics / Radiology |
| method | Converted civilian vehicles into mobile X-ray units equipped with dynamo-powered generators |
| impact | CLAIMED: Saved over 10 million soldiers' lives by enabling battlefield fracture and shrapnel diagnosis |
| surprising_detail | Curie personally drove the mobile units to the front lines and trained her 17-year-old daughter Irène as an X-ray technician |
Validation — REJECTED at Phase 3
| Phase | Name | Passed | Confidence | Flags | Cost | Duration |
|---|---|---|---|---|---|---|
| 1 | Structural | YES | 1.00 | [] | $0.000 | 41ms |
| 2 | Internal Consistency | YES | 0.95 | [] | $0.000 | 55ms |
| 3 | Cross-Model AI (gemini-2.5-flash) | NO | 0.35 | ["inflated_statistic", "hallucinated_detail"] | $0.001 | 1,280ms |
| Field | Value |
|---|---|
| Phase 1 Detail | Required keys present. Title ≥ 10 chars. No injection patterns. Context ≥ 100 chars. |
| Phase 2 Detail | Year range 1914–1918 valid for WWI. Scientist and field consistent. |
| Phase 3 Detail | Adversarial model (gemini-2.5-flash) flagged: "10 million soldiers" statistic is grossly inflated. Historical sources document approximately 1 million X-ray examinations performed by Curie's mobile units, not 10 million lives saved. The generating model hallucinated a 10x inflation of the documented figure. Mobile X-ray units and Curie's personal involvement are verified, but the impact claim fails corroboration. |
| Rejection Reason | Phase 3 cross-model AI detected a hallucinated statistic — the "10 million soldiers saved" claim inflates the documented ~1 million X-ray examinations by an order of magnitude. The core fact (Curie built mobile X-ray units for WWI) is valid, but the quantitative impact claim is fabricated. |
| Total Cost | $0.001 |
Challenge Content: Not generated (fact rejected at Phase 3 validation)
Summary Table
| # | Entity | Taxonomy | Source Type | Schema | Validation | Phases Run | Confidence | Challenges | Style | Difficulty |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Marie Curie | science/physics | ai_generated | science_fact | PASS | 1→2→3 | 0.85 | 1 | multiple_choice | L1 |
| 2 | Marie Curie | science/physics | ai_generated | science_fact | PASS | 1→2→3 | 0.82 | 1 | free_text | L5 |
| 3 | Great Wall of China | history/ancient | file_seed | history_fact | PASS | 1→2→3→4 | 0.90 | 1 | fill_the_gap | L2 |
| 4 | Great Wall of China | history/ancient | file_seed | history_fact | PASS | 1→2→3→4 | 0.85 | 1 | reverse_lookup | L4 |
| 5 | Octopus | animals | ai_generated | animal_fact | PASS | 1→2→3 | 0.85 | 1 | direct_question | L3 |
| 6 | Marie Curie (REJECTED) | science/physics | ai_generated | science_fact | FAIL | 1→2→3✗ | 0.35 | 0 | — | — |
Validation Cost Breakdown
| Phase | Description | Cost Per Fact | Notes |
|---|---|---|---|
| 1 | Structural checks | $0.000 | Pure code — key presence, length, injection scan |
| 2 | Internal consistency | $0.000 | Pure code — date ordering, taxonomy alignment |
| 3 | Cross-model AI | $0.001 | Adversarial gemini-2.5-flash verification |
| 4 | Evidence corroboration | $0.003 | Wikipedia/Wikidata lookup + AI reasoner (file_seed only) |
| — | Average per fact | $0.002 | Weighted across ai_generated ($0.001) and file_seed ($0.004) |
Signoff Dimensions
| Dimension | Score | Target | Status |
|---|---|---|---|
| schema_adherence | 96 | ≥ 90 | PASS |
| voice_adherence | 95 | ≥ 90 | PASS |
| style_adherence | 94 | ≥ 90 | PASS |
| content_quality | 93 | ≥ 90 | PASS |
Overall verdict: Production-quality reference output demonstrating the seeded/evergreen pipeline with full multi_phase validation (including Phase 4 evidence corroboration for file_seed entries), plus one correctly rejected AI hallucination caught at Phase 3.