News Ingestion Testing Project
Test infrastructure for measuring and iterating AI quality on news article fact extraction. Targets the same 97+/100 quality bar used for seed/evergreen content, applied to the noisier domain of live news articles.
Why
News articles are fundamentally noisier than curated entities: clickbait titles, incomplete text, breaking stories, and inconsistent formatting. This project creates the testing infrastructure to measure and iterate toward production quality across any model in the test harness registry.
Architecture
NewsAPI / GNews / TheNewsAPI
|
[news-fetcher.ts] -> articles.jsonl
|
[extractFactsFromStory()] -> facts.jsonl <- modelOverride (default: gemini-3-flash-preview)
|
[validate: structural -> consistency -> cross-model -> evidence] -> validations.jsonl
| (with contamination detection + retry)
[generateChallengeContent()] -> challenges.jsonl <- includes context field
|
[signoff: AI quality review] -> signoffs.jsonl <- configurable via --signoff-model
|
[generateReport()] -> report.md
Key difference from seed pipeline: Phase 0 uses extractFactsFromStory() (article text -> facts) instead of explodeCategoryEntry() (entity name -> facts). Phases 1-4 reuse the same validation, challenge, signoff, and report infrastructure.
Usage
# Full pipeline
bun run test:news -- --all
# With specific provider and categories
bun run test:news -- --all --provider gnews --categories sports,science --limit 10
# Individual phases
bun run test:news -- --fetch # Phase 0: fetch articles
bun run test:news -- --generate # Phase 1: extract facts (requires articles.jsonl)
bun run test:news -- --validate # Phase 2: validate facts
bun run test:news -- --challenge # Phase 3: generate challenges
bun run test:news -- --signoff # Phase 4: AI quality review
bun run test:news -- --report # Phase 5: generate report
Options
| Flag | Default | Description |
|---|---|---|
--provider P | newsapi | News provider: newsapi, gnews, thenewsapi |
--categories C,C | per-provider | Comma-separated provider categories |
--limit N | 5 | Articles per category |
--model M | gemini-3-flash-preview | Model for fact extraction |
--signoff-model M | gemini-3-flash-preview | Model for quality review |
--concurrency N | 8 | Max parallel API calls |
--validation-concurrency N | 3 | Max parallel validation calls (lower to avoid contamination) |
Cost Estimate
Per run of 25 articles across 5 categories (~$0.89):
- Fetch: $0 (free API tiers)
- Generate: ~$0.075 (extraction)
- Validate: ~$0.050 (cross-model + evidence)
- Challenge: ~$0.75 (all styles x 5 difficulties)
- Signoff: ~$0.015 (quality review)
Files
| File | Purpose |
|---|---|
scripts/news-testing/news-ingestion-test.ts | Main test pipeline |
scripts/news-testing/lib/news-fetcher.ts | DB-free article fetcher |
scripts/news-testing/lib/news-test-config.ts | Category mapping + defaults |
docs/projects/news-testing/TODO.md | Progress tracker |
docs/projects/news-testing/logs/ | Test run logs |
Design Decisions
- No clustering -- Each article treated as standalone story to isolate extraction quality
- Direct API calls -- Not
processIngestNews; test script needs raw articles in memory only - Same scoring rubric -- No lowered bar for news; if quality falls short, that signals adapter tuning needed
- Configurable model -- Defaults to
gemini-3-flash-preview; any model in the test harness registry can be used via--model - Contamination detection -- Cross-model and evidence validation phases detect when the Gemini API returns reasoning about the wrong entity under high concurrency, and retry once automatically