News Ingestion Testing Project

Test infrastructure for measuring and iterating AI quality on news article fact extraction. Targets the same 97+/100 quality bar used for seed/evergreen content, applied to the noisier domain of live news articles.

Why

News articles are fundamentally noisier than curated entities: clickbait titles, incomplete text, breaking stories, and inconsistent formatting. This project creates the testing infrastructure to measure and iterate toward production quality across any model in the test harness registry.

Architecture

NewsAPI / GNews / TheNewsAPI
        |
    [news-fetcher.ts]         -> articles.jsonl
        |
    [extractFactsFromStory()] -> facts.jsonl     <- modelOverride (default: gemini-3-flash-preview)
        |
    [validate: structural -> consistency -> cross-model -> evidence]  -> validations.jsonl
        |                       (with contamination detection + retry)
    [generateChallengeContent()]  -> challenges.jsonl  <- includes context field
        |
    [signoff: AI quality review]  -> signoffs.jsonl    <- configurable via --signoff-model
        |
    [generateReport()]  -> report.md

Key difference from seed pipeline: Phase 0 uses extractFactsFromStory() (article text -> facts) instead of explodeCategoryEntry() (entity name -> facts). Phases 1-4 reuse the same validation, challenge, signoff, and report infrastructure.

Usage

# Full pipeline
bun run test:news -- --all

# With specific provider and categories
bun run test:news -- --all --provider gnews --categories sports,science --limit 10

# Individual phases
bun run test:news -- --fetch                    # Phase 0: fetch articles
bun run test:news -- --generate                 # Phase 1: extract facts (requires articles.jsonl)
bun run test:news -- --validate                 # Phase 2: validate facts
bun run test:news -- --challenge                # Phase 3: generate challenges
bun run test:news -- --signoff                  # Phase 4: AI quality review
bun run test:news -- --report                   # Phase 5: generate report

Options

FlagDefaultDescription
--provider PnewsapiNews provider: newsapi, gnews, thenewsapi
--categories C,Cper-providerComma-separated provider categories
--limit N5Articles per category
--model Mgemini-3-flash-previewModel for fact extraction
--signoff-model Mgemini-3-flash-previewModel for quality review
--concurrency N8Max parallel API calls
--validation-concurrency N3Max parallel validation calls (lower to avoid contamination)

Cost Estimate

Per run of 25 articles across 5 categories (~$0.89):

  • Fetch: $0 (free API tiers)
  • Generate: ~$0.075 (extraction)
  • Validate: ~$0.050 (cross-model + evidence)
  • Challenge: ~$0.75 (all styles x 5 difficulties)
  • Signoff: ~$0.015 (quality review)

Files

FilePurpose
scripts/news-testing/news-ingestion-test.tsMain test pipeline
scripts/news-testing/lib/news-fetcher.tsDB-free article fetcher
scripts/news-testing/lib/news-test-config.tsCategory mapping + defaults
docs/projects/news-testing/TODO.mdProgress tracker
docs/projects/news-testing/logs/Test run logs

Design Decisions

  1. No clustering -- Each article treated as standalone story to isolate extraction quality
  2. Direct API calls -- Not processIngestNews; test script needs raw articles in memory only
  3. Same scoring rubric -- No lowered bar for news; if quality falls short, that signals adapter tuning needed
  4. Configurable model -- Defaults to gemini-3-flash-preview; any model in the test harness registry can be used via --model
  5. Contamination detection -- Cross-model and evidence validation phases detect when the Gemini API returns reasoning about the wrong entity under high concurrency, and retry once automatically