2. Audit Fact Quality
Purpose: Run a read-only audit of fact corpus health — null rates, validation pass rates, topic distribution, and duplicate detection.
Prerequisites:
- Supabase credentials (read-only access sufficient)
Cost / Duration: $0 (read-only queries) | 1-2 minutes
Prompt
Run a fact quality audit to assess corpus health.
Step 1 — Run the backfill audit (read-only, no modifications):
```bash
bun scripts/seed/backfill-fact-nulls.ts --audit
```
Step 2 — Run these SQL queries against the database to get a full health picture:
```sql
-- Null rates by field
SELECT
COUNT(*) FILTER (WHERE notability_score IS NULL) AS null_notability,
COUNT(*) FILTER (WHERE image_url IS NULL) AS null_images,
COUNT(*) AS total_facts
FROM fact_records
WHERE status = 'validated';
-- Validation pass rates
SELECT
status,
COUNT(*) AS count,
ROUND(COUNT(*)::numeric / SUM(COUNT(*)) OVER () * 100, 1) AS pct
FROM fact_records
GROUP BY status
ORDER BY count DESC;
-- Topic distribution
SELECT
tc.name AS topic,
COUNT(f.id) AS fact_count
FROM fact_records f
JOIN topic_categories tc ON f.topic_category_id = tc.id
WHERE f.status = 'validated'
GROUP BY tc.name
ORDER BY fact_count DESC;
-- Challenge content coverage
SELECT
COUNT(DISTINCT f.id) FILTER (WHERE fcc.id IS NOT NULL) AS facts_with_challenges,
COUNT(DISTINCT f.id) AS total_validated
FROM fact_records f
LEFT JOIN fact_challenge_content fcc ON fcc.fact_record_id = f.id
WHERE f.status = 'validated';
```
Report the results as a health scorecard with recommendations for any
areas below threshold (e.g., null rates > 5%, topic imbalance > 3:1).
Verification
- Backfill audit completes without errors
- Null rates reported for notability, challenges, images
- Validation pass rate reported (target: >95% active)
- Topic distribution reported with balance ratio
- Duplicate detection stats reported
- Recommendations provided for any below-threshold areas
Related Prompts
- Backfill Null Metadata — Fix nulls found during audit
- Content Cleanup Pass — Fix voice inconsistencies
- Seed the Database — Add more content if coverage is low