News Ingestion Testing Project

Test infrastructure for measuring and iterating AI quality on news article fact extraction. Targets the same 97+/100 quality bar used for seed/evergreen content, applied to the noisier domain of live news articles.

Why

News articles are fundamentally noisier than curated entities: clickbait titles, incomplete text, breaking stories, and inconsistent formatting. This project creates the testing infrastructure to measure and iterate toward production quality across any model in the test harness registry.

Architecture

NewsAPI / GNews / TheNewsAPI
        |
    [news-fetcher.ts]         -> articles.jsonl
        |
    [extractFactsFromStory()] -> facts.jsonl     <- modelOverride (default: gemini-3-flash-preview)
        |
    [validate: structural -> consistency -> cross-model -> evidence]  -> validations.jsonl
        |                       (with contamination detection + retry)
    [generateChallengeContent()]  -> challenges.jsonl  <- includes context field
        |
    [signoff: AI quality review]  -> signoffs.jsonl    <- configurable via --signoff-model
        |
    [generateReport()]  -> report.md

Key difference from seed pipeline: Phase 0 uses extractFactsFromStory() (article text -> facts) instead of explodeCategoryEntry() (entity name -> facts). Phases 1-4 reuse the same validation, challenge, signoff, and report infrastructure.

Usage

# Full pipeline
bun run test:news -- --all

# With specific provider and categories
bun run test:news -- --all --provider gnews --categories sports,science --limit 10

# Individual phases
bun run test:news -- --fetch                    # Phase 0: fetch articles
bun run test:news -- --generate                 # Phase 1: extract facts (requires articles.jsonl)
bun run test:news -- --validate                 # Phase 2: validate facts
bun run test:news -- --challenge                # Phase 3: generate challenges
bun run test:news -- --signoff                  # Phase 4: AI quality review
bun run test:news -- --report                   # Phase 5: generate report

Options

Flag	Default	Description
`--provider P`	`newsapi`	News provider: `newsapi`, `gnews`, `thenewsapi`
`--categories C,C`	per-provider	Comma-separated provider categories
`--limit N`	`5`	Articles per category
`--model M`	`gemini-3-flash-preview`	Model for fact extraction
`--signoff-model M`	`gemini-3-flash-preview`	Model for quality review
`--concurrency N`	`8`	Max parallel API calls
`--validation-concurrency N`	`3`	Max parallel validation calls (lower to avoid contamination)

Cost Estimate

Per run of 25 articles across 5 categories (~$0.89):

Fetch: $0 (free API tiers)
Generate: ~$0.075 (extraction)
Validate: ~$0.050 (cross-model + evidence)
Challenge: ~$0.75 (all styles x 5 difficulties)
Signoff: ~$0.015 (quality review)

Files

File	Purpose
`scripts/news-testing/news-ingestion-test.ts`	Main test pipeline
`scripts/news-testing/lib/news-fetcher.ts`	DB-free article fetcher
`scripts/news-testing/lib/news-test-config.ts`	Category mapping + defaults
`docs/projects/news-testing/TODO.md`	Progress tracker
`docs/projects/news-testing/logs/`	Test run logs

Design Decisions

No clustering -- Each article treated as standalone story to isolate extraction quality
Direct API calls -- Not processIngestNews; test script needs raw articles in memory only
Same scoring rubric -- No lowered bar for news; if quality falls short, that signals adapter tuning needed
Configurable model -- Defaults to gemini-3-flash-preview; any model in the test harness registry can be used via --model
Contamination detection -- Cross-model and evidence validation phases detect when the Gemini API returns reasoning about the wrong entity under high concurrency, and retry once automatically

#News Ingestion Testing Project

#Why

#Architecture

#Usage

#Options

#Cost Estimate

#Files

#Design Decisions