3. Audit Taxonomy Schema Coverage
Purpose: Analyze a taxonomy's fact schema against external structured data (KG, Wikidata, Wikipedia, domain APIs) to surface missing fields, type mismatches, vocabulary gaps, and ownership gaps (missing vocabulary/voice entries for subcategories). Supports single-node audits and recursive tree walks with structured action plan output.
Prerequisites:
DATABASE_URLconfigured (samples entity titles fromfact_records)GOOGLE_KG_API_KEYoptional (KG checks gracefully skip without it)- The target taxonomy must have fact records in the DB — the script samples entity titles (e.g., "LeBron James", "Manchester United") to query external APIs
- Any active
topic_categoriesslug is valid — subcategories that inherit rules/voice from parents are fully supported
Cost / Duration: $0 (free APIs only) | 1-3 minutes per node (depends on --sample size and --depth)
Prompt
Run a schema coverage audit for the [taxonomy] taxonomy to find
gaps in the fact schema.
```bash
# Single node (backward compatible)
bun scripts/taxonomy/audit-schema-coverage.ts [taxonomy-slug]
# Recursive tree walk
bun scripts/taxonomy/audit-schema-coverage.ts [taxonomy-slug] --depth=1
# Start at a subcategory
bun scripts/taxonomy/audit-schema-coverage.ts basketball --depth=1
```
Options:
- `--sample=N` — Sample size per node (default 10, max 50)
- `--json` — Output full action plan as JSON to stdout
- `--summary` — Output summary counts only
- `--depth=N` — Recursion depth 0-3 (default: 0 = single-node)
- `--output=PATH` — Output directory (default: docs/reports/taxonomy/)
- `--no-write` — Skip writing files, console only (default when depth=0)
The script will:
1. Resolve the start node from DB (any active slug, not just root categories)
2. At each node: resolve inherited rules/voice, fetch schema, sample entities
3. Query KG, Wikidata, Wikipedia, and domain-specific APIs per entity
4. Run 6 analysis checks comparing external data against the schema
5. Flag ownership gaps (subcategories missing own vocabulary/voice entries)
6. If depth>0: recurse into children, generate structured action plan
7. Batch-resolve Wikidata property labels after the full tree walk
8. Output report to console and optionally write JSON + Markdown files
Review the suggestions and decide which to act on:
- **Schema field additions:** Add new fact_keys via migration
- **Type fixes:** Update fact_key types via migration
- **Vocabulary additions:** Edit taxonomy-rules-data.ts directly
- **Voice additions:** Edit taxonomy-voices-data.ts directly
- **Ownership gaps:** Create dedicated entries for subcategories
Available root slugs (33 active roots):
animals, architecture, art, auto, business, cooking, culture,
current-events, design, entertainment, fashion, food-beverage,
games, geography, geology, governments, health-medicine, history,
home-living, how-things-work, language-linguistics, math, movies,
music, people, places, publishing, records, science, space-astronomy,
sports, technology, travel, tv, weather-climate
Any active topic_categories slug is valid at any depth (1,104 total
categories). Use `--depth=1` or `--depth=2` to walk children from
any starting node.
Analysis Checks
| Check | What It Surfaces | Threshold |
|---|---|---|
| Schema field coverage | Wikidata properties missing from factKeys | ≥70% of entities |
| Type alignment | factKey type vs Wikidata value type mismatches | Any mismatch |
| Vocabulary gaps | Wikipedia category terms missing from domain_terms | ≥30% of entities |
| Entity type distribution | KG type breakdown (Person, Org, Place, etc.) | Informational |
| Domain-specific coverage | TheSportsDB/MusicBrainz fields (sports/music only) | ≥50% of entities |
| Sitelink notability | Entities with <20 Wikipedia language links | <20 sitelinks |
Action Item Priorities (depth>0)
| Source | Priority |
|---|---|
| Schema field suggestion (≥90% coverage) | high |
| Schema field suggestion (≥70% coverage) | medium |
| Type alignment mismatch | high |
| Missing vocabulary entry (subcategory) | medium |
| Missing voice entry (subcategory) | medium |
| Vocabulary gap (Wikipedia categories) | low |
| Low notability entity (<20 sitelinks) | low |
Verification
- Script runs without errors for the target taxonomy
-
--depth=0behaves identically to the original single-node audit -
--depth=1walks children and generates action plan files - Suggested field additions reviewed and either adopted (via migration) or dismissed with rationale
- Type mismatches reviewed and corrected if appropriate
- Vocabulary suggestions reviewed and added to
taxonomy-rules-data.tsif appropriate - Ownership gaps reviewed and subcategory entries created if appropriate
- If schema changes were made:
bun run typecheckpasses - If migrations were applied:
bun run migrations:indexandbun run migrations:checkpass
Related Prompts
- Add New Topic Category — Create new taxonomies before auditing them
- Test Taxonomy Categories — Verify taxonomy integrity after changes