3. Audit Taxonomy Schema Coverage

Purpose: Analyze a taxonomy's fact schema against external structured data (KG, Wikidata, Wikipedia, domain APIs) to surface missing fields, type mismatches, vocabulary gaps, and ownership gaps (missing vocabulary/voice entries for subcategories). Supports single-node audits and recursive tree walks with structured action plan output.

Prerequisites:

DATABASE_URL configured (samples entity titles from fact_records)
GOOGLE_KG_API_KEY optional (KG checks gracefully skip without it)
The target taxonomy must have fact records in the DB — the script samples entity titles (e.g., "LeBron James", "Manchester United") to query external APIs
Any active topic_categories slug is valid — subcategories that inherit rules/voice from parents are fully supported

Cost / Duration: $0 (free APIs only) | 1-3 minutes per node (depends on --sample size and --depth)

Prompt

Run a schema coverage audit for the [taxonomy] taxonomy to find
gaps in the fact schema.

```bash
# Single node (backward compatible)
bun scripts/taxonomy/audit-schema-coverage.ts [taxonomy-slug]

# Recursive tree walk
bun scripts/taxonomy/audit-schema-coverage.ts [taxonomy-slug] --depth=1

# Start at a subcategory
bun scripts/taxonomy/audit-schema-coverage.ts basketball --depth=1
```

Options:
- `--sample=N`    — Sample size per node (default 10, max 50)
- `--json`        — Output full action plan as JSON to stdout
- `--summary`     — Output summary counts only
- `--depth=N`     — Recursion depth 0-3 (default: 0 = single-node)
- `--output=PATH` — Output directory (default: docs/reports/taxonomy/)
- `--no-write`    — Skip writing files, console only (default when depth=0)

The script will:
1. Resolve the start node from DB (any active slug, not just root categories)
2. At each node: resolve inherited rules/voice, fetch schema, sample entities
3. Query KG, Wikidata, Wikipedia, and domain-specific APIs per entity
4. Run 6 analysis checks comparing external data against the schema
5. Flag ownership gaps (subcategories missing own vocabulary/voice entries)
6. If depth>0: recurse into children, generate structured action plan
7. Batch-resolve Wikidata property labels after the full tree walk
8. Output report to console and optionally write JSON + Markdown files

Review the suggestions and decide which to act on:
- **Schema field additions:** Add new fact_keys via migration
- **Type fixes:** Update fact_key types via migration
- **Vocabulary additions:** Edit taxonomy-rules-data.ts directly
- **Voice additions:** Edit taxonomy-voices-data.ts directly
- **Ownership gaps:** Create dedicated entries for subcategories

Available root slugs (33 active roots):
animals, architecture, art, auto, business, cooking, culture,
current-events, design, entertainment, fashion, food-beverage,
games, geography, geology, governments, health-medicine, history,
home-living, how-things-work, language-linguistics, math, movies,
music, people, places, publishing, records, science, space-astronomy,
sports, technology, travel, tv, weather-climate

Any active topic_categories slug is valid at any depth (1,104 total
categories). Use `--depth=1` or `--depth=2` to walk children from
any starting node.

Analysis Checks

Check	What It Surfaces	Threshold
Schema field coverage	Wikidata properties missing from factKeys	≥70% of entities
Type alignment	factKey type vs Wikidata value type mismatches	Any mismatch
Vocabulary gaps	Wikipedia category terms missing from domain_terms	≥30% of entities
Entity type distribution	KG type breakdown (Person, Org, Place, etc.)	Informational
Domain-specific coverage	TheSportsDB/MusicBrainz fields (sports/music only)	≥50% of entities
Sitelink notability	Entities with <20 Wikipedia language links	<20 sitelinks

Action Item Priorities (depth>0)

Source	Priority
Schema field suggestion (≥90% coverage)	`high`
Schema field suggestion (≥70% coverage)	`medium`
Type alignment mismatch	`high`
Missing vocabulary entry (subcategory)	`medium`
Missing voice entry (subcategory)	`medium`
Vocabulary gap (Wikipedia categories)	`low`
Low notability entity (<20 sitelinks)	`low`

Verification

Script runs without errors for the target taxonomy
--depth=0 behaves identically to the original single-node audit
--depth=1 walks children and generates action plan files
Suggested field additions reviewed and either adopted (via migration) or dismissed with rationale
Type mismatches reviewed and corrected if appropriate
Vocabulary suggestions reviewed and added to taxonomy-rules-data.ts if appropriate
Ownership gaps reviewed and subcategory entries created if appropriate
If schema changes were made: bun run typecheck passes
If migrations were applied: bun run migrations:index and bun run migrations:check pass

Add New Topic Category — Create new taxonomies before auditing them
Test Taxonomy Categories — Verify taxonomy integrity after changes

Back to index

#3. Audit Taxonomy Schema Coverage

#Prompt

#Analysis Checks

#Action Item Priorities (depth>0)

#Verification

#Related Prompts

3. Audit Taxonomy Schema Coverage

Prompt

Analysis Checks

Action Item Priorities (depth>0)

Verification

Related Prompts