4. Generate Challenge Content

Purpose: Generate pre-built challenge content (6 quiz styles x 5 difficulty levels) for validated facts using a multi-phase JSONL pipeline.

Prerequisites:

  • .env.local has DATABASE_URL and API key for the routed model (see available models)
  • Validated facts exist in fact_records (run worker-validate first if facts are still pending_validation)
  • Sufficient API budget for the run (model selected by tier routing via selectModelForTask)

Cost / Duration: $6-$60 depending on fact count (see cost table below) | 4-12 hours

Prompt

I need to generate challenge content for validated facts. Walk me through the
6-phase pipeline, reporting progress and costs after each phase.

### Phase 1: Audit current coverage

```bash
bun scripts/seed/generate-challenge-content.ts --audit
```

Report how many validated facts have 0, 1, 2, ... 6 challenge styles.
This tells us the scope of the run.

### Phase 2: Export facts needing content

```bash
bun scripts/seed/generate-challenge-content.ts --export
```

This dumps validated facts missing challenge content to
`.challenge-data/facts-export.jsonl` (gitignored).

### Phase 3: Generate challenge content

First, do a dry run on a small sample:

```bash
bun scripts/seed/generate-challenge-content.ts --generate --dry-run --limit 10
```

Review the sample output. If it looks good, run in partitions for parallel execution:

```bash
bun scripts/seed/generate-challenge-content.ts --generate --concurrency 5 --partition 1/4
bun scripts/seed/generate-challenge-content.ts --generate --concurrency 5 --partition 2/4
bun scripts/seed/generate-challenge-content.ts --generate --concurrency 5 --partition 3/4
bun scripts/seed/generate-challenge-content.ts --generate --concurrency 5 --partition 4/4
```

Each partition generates 6 styles per fact: multiple_choice, direct_question,
fill_the_gap, statement_blank, reverse_lookup, free_text. Each challenge gets
its own per-style `challenge_title` (theatrical, cinematic — generated alongside
the question/answer, not inherited from the fact record). Results go to
`.challenge-data/challenges-generated.jsonl` (partition-aware filenames).

### Phase 4: Upload to database

Preview first, then upload:

```bash
bun scripts/seed/generate-challenge-content.ts --upload --dry-run
bun scripts/seed/generate-challenge-content.ts --upload
```

Bulk upserts to `fact_challenge_content` using
`onConflictDoUpdate` on `(fact_record_id, challenge_style, target_fact_key, difficulty)`.

### Phase 5: Validate quality

```bash
bun scripts/seed/generate-challenge-content.ts --validate
```

Post-upload quality check against CC/CQ rules. Samples 20 random facts for
manual review and reports CQ-002 pass rate.

### Phase 6: Recover weak outputs (optional)

If validation reports issues:

```bash
bun scripts/seed/generate-challenge-content.ts --recover
```

Re-processes facts with validation issues. Supports `--partition N/M` for
parallel execution.

After all phases, run `--audit` again to confirm full coverage.

### Fact Challenge Groups (FCG)

For group-based generation (5 challenges per fact in a single API call), use:

```bash
bun scripts/seed/test-fcg-diversity.ts --limit 10 --models gpt-5.4-nano
```

FCG generation uses `generateGroupChallenges()` which:
- Produces 5 challenges (C1-C5) with increasing difficulty in one call
- C1-C2: multiple_choice; C3-C5: AI-selected from fill_the_gap, direct_question, reverse_lookup, free_text
- Enforces key diversity via `validateGroupDiversity()` — min 3-4 unique keys, max 2 per key
- Retries with explicit round-robin key assignments if diversity validation fails
- Reports per-entity key distribution in `docs/test-results/challenge-groups/test-group-3/`

Cost Table

FactsStylesEstimated Cost
1,0006 (all)~$6
5,0006 (all)~$30
10,0006 (all)~$60
10,0003 (mc,dq,ftg)~$30

Verification

  • --audit shows all validated facts have 6 challenge styles
  • Zero facts with 0 styles remaining
  • CQ-002 pass rate above 95% on --validate
  • Upload completed without conflict errors
  • Cost summary reported for the run
  • FCG diversity test passes (bun scripts/seed/test-fcg-diversity.ts) — all groups use 3+ unique keys

Back to index