4. Update Seed Controls

Purpose: Adjust seeding pipeline directives like mode, volume, quality thresholds, and cost caps before a seed run. Also covers adding or expanding CategorySpec definitions (44 specs across 33 root categories, ~49,000 seed entries total).

Prerequisites:

  • Bun installed with bun install completed
  • For new categories: target category must exist in topic_categories DB table
  • For seed generation: AI model must be configured (check ai_model_tier_config — the seed_explosion task uses the default tier)

Cost / Duration: $0 for directives | $1-10 for seed generation (depends on entity count and model)

Prompt

Update seed controls for the Eko seeding pipeline.

I need to change: [describe what you want to update, e.g.,
"increase volume target to 500 facts",
"raise quality threshold to 0.8",
"switch to curated mode",
"increase cost cap to $50"]

Option A — Update seed control directives:

Open `docs/projects/seeding/SEED.md` and update the relevant
control values in the directives section:
- mode (curated vs. automated)
- volume targets
- quality_threshold
- cost_cap
- Any other pipeline control parameters

This file serves as the single source of truth for seeding
directives. A Claude session reads this file and interprets
the directives when running seeding operations.

Option B — Add or expand CategorySpec definitions:

Open `packages/ai/src/config/categories.ts` to add or modify
CategorySpec entries. Each spec defines:
- `slug`: matches the root category slug in `topic_categories`
- `subcategories`: array of { name, count, prompt } objects
  - `name`: must match a depth-1 subcategory name
  - `count`: number of entity names to generate (25-200)
  - `prompt`: AI prompt describing what entities to generate

There are currently 44 CategorySpec definitions covering all
33 active root categories. Some roots have multiple specs
(e.g., sports has per-sport specs like baseball, basketball).

After editing, generate seed entries:

```bash
bun scripts/seed/generate-curated-entries.ts --category <slug> --insert
```

The script resolves subcategories via DB lookup, generates
entity names using AI (routed through `seed_explosion` task →
`default` tier), and inserts into `seed_entry_queue`.

Option C — Add PATH_CATEGORY_MAP entries:

For new root categories, add keyword patterns to
`scripts/seed/lib/category-mapper.ts` so that URLs can be
auto-mapped to the correct category during ingestion.

Verify the updated values make sense:

- Cost cap is sufficient for the target volume
- Quality threshold is appropriate for the seeding mode
- Volume targets are achievable within budget
- AI model is available for the `default` tier (check
  `ai_model_tier_config` — the `seed_explosion` task maps to
  `default` tier via `TASK_TIER_MAP` in `packages/ai/src/fact-engine.ts`)

Key references:
- Seed control directives: `docs/projects/seeding/SEED.md`
- CategorySpec definitions: `packages/ai/src/config/categories.ts`
- Seed generation script: `scripts/seed/generate-curated-entries.ts`
- Category mapper: `scripts/seed/lib/category-mapper.ts`
- Seeding scripts: `scripts/seed/`
- Environment cost controls: `packages/config/src/index.ts`
- Model routing for seeds: `packages/ai/src/fact-engine.ts` (TASK_TIER_MAP)

Verification

  • Seed control values updated in docs/projects/seeding/SEED.md (if changing directives)
  • CategorySpec added/expanded in categories.ts (if adding categories)
  • PATH_CATEGORY_MAP updated in category-mapper.ts (if adding new roots)
  • Values are internally consistent (cost cap supports volume target)
  • generate-curated-entries.ts --category <slug> --insert succeeds (if generating seeds)
  • Seed counts verified: SELECT count(*) FROM seed_entry_queue WHERE topic_category_id = ...
  • Next seed run picks up the new directives

Back to index