Tools Reference

Scope: Runtime capabilities & constraints for the V2 fact engine Audience: Engineers, AI agents Relationship: Constrained by Architecture Decisions; implemented via STACK.md

Purpose

This document defines the tools Eko has access to at runtime and the hard limits of those tools.

It exists to prevent:

Assumed capabilities
Tool-based hallucination
Silent scope creep

Tooling Principles

All tools must:

Produce schema-validated output — facts conform to fact_record_schemas.fact_keys (ADR-015)
Operate within cost bounds — daily spend caps, per-call tracking (ADR-016)
Flow through multi-pipeline convergence — all fact sources produce fact_records (ADR-014)
Ensure verification before publication — no fact reaches users without validation

If a capability violates an existing ADR, it is not a valid tool.

Tool Classes

1. News Ingestion Tool

Purpose: Fetch articles from external news APIs and deduplicate against existing records.

Can:

Fetch from Newsdata.io and Event Registry (active providers)
Deduplicate articles by URL and content hash against news_sources
Resolve hero images from article metadata

Cannot:

Crawl websites or follow links
Access paywalled content
Fetch from deprecated providers (NewsAPI, GNews, TheNewsAPI)

Invocation: Scheduled via INGEST_NEWS queue messages dispatched by cron

2. Story Clustering Tool

Purpose: Group related news articles into story clusters for batch extraction.

Can:

Cluster articles using TF-IDF cosine similarity
Merge clusters when similarity exceeds threshold
Batch unclustered sources older than 1 hour

Cannot:

Create cross-topic clusters
Modify article content

Invocation: CLUSTER_STORIES queue messages dispatched by cron/cluster-sweep

3. AI Fact Extraction Tool

Purpose: Extract structured facts from story clusters using schema-constrained AI.

Can:

Extract key-value facts constrained by per-topic fact_record_schemas
Score notability (0.0-1.0) with minimum threshold of 0.6
Assign topic categories from the taxonomy tree
Generate evergreen (timeless) facts from topic prompts
Explode seed entries into many structured facts with spinoff discovery

Cannot:

Extract facts without a matching schema definition
Reproduce full article content (non-substitutive)
Bypass notability threshold

Invocation: EXTRACT_FACTS, GENERATE_EVERGREEN, EXPLODE_CATEGORY_ENTRY queue messages

Constraints: Schema-driven (ADR-015), cost-bounded (ADR-016)

4. Fact Validation Tool

Purpose: Verify extracted facts through a 4-phase pipeline before publication.

Phases:

Phase	Method	Description
1. Structural	Rule-based	Schema conformance, required fields, format checks
2. Consistency	Rule-based	Cross-field logic, temporal plausibility, value ranges
3. Cross-model	AI (Gemini 2.5 Flash)	Independent AI verification against a different model
4. Evidence	APIs + AI	Corroboration via external authoritative sources

Can:

Mark facts as validated, rejected, or needs_review
Record validation evidence and confidence scores per phase
Retry stuck validations (>4h without update)

Cannot:

Modify fact content (only status and validation metadata)
Skip phases

Invocation: VALIDATE_FACT queue messages (automatic post-extraction + retry cron)

5. Evidence API Tool

Purpose: Query external authoritative sources for fact corroboration during validation Phase 4.

Available APIs (20+):

API	Domain	Client File
Wikipedia	General knowledge	`wikipedia-client.ts`
Wikidata	Structured data	`wikidata-client.ts`
TheSportsDB	Sports facts	`sportsdb-client.ts`
MusicBrainz	Music metadata	`musicbrainz-client.ts`
Nominatim	Geographic data	`nominatim-client.ts`
Open Library	Book/author data	`openlibrary-client.ts`
OMDb	Film/TV metadata	`omdb-client.ts`
Met Museum (ARTIC)	Art & exhibits	`metmuseum-client.ts`
NASA	Space/astronomy	`nasa-client.ts`
USDA FoodData	Nutrition facts	`usda-client.ts`
REST Countries	Country data	`restcountries-client.ts`
IUCN Red List	Endangered species	`iucn-client.ts`
RAWG	Video game data	`rawg-client.ts`
OpenAlex	Academic papers	`openalex-client.ts`
Nobel Prize	Nobel laureates	`nobelprize-client.ts`
Financial Modeling Prep	Financial data	`fmp-client.ts`
Finnhub	Stock market data	`finnhub-client.ts`
FRED	Economic data	`fred-client.ts`
World Bank	Development indicators	`worldbank-client.ts`
Open-Meteo	Weather/climate	`openmeteo-client.ts`
TheMealDB	Recipe/food data	`themealdb-client.ts`
Wikimedia Enterprise	High-quality Wikipedia	`wikimedia-enterprise-client.ts`

Cannot:

Access paid-only APIs without configured keys
Serve as primary fact source (evidence only)

All clients live in packages/ai/src/.

6. Image Resolution Tool

Purpose: Resolve and cache images for fact cards and challenges.

Can:

Search stock photo APIs (Unsplash, Pexels) for topic-relevant images
Cache resolved images via Cloudflare R2
Resolve anti-spoiler images for challenges where the fact image would reveal the answer

Cannot:

Generate images
Use copyrighted/non-stock images

Invocation: RESOLVE_IMAGE and RESOLVE_CHALLENGE_IMAGE queue messages

7. Challenge Generation Tool

Purpose: Pre-compute interactive quiz content for validated facts.

Can:

Generate challenge content across 6 styles (multiple_choice, true_false, fill_blank, free_text, timeline, ranking)
Scale difficulty from C1 (easiest, 1 point) to C5 (hardest, 5 points)
Generate challenge titles for display
Map facts to 8 named challenge formats (big_fan_of, know_a_lot_about, etc.)

Cannot:

Generate challenges for unvalidated facts
Reveal answers in challenge metadata or URLs

Invocation: GENERATE_CHALLENGE_CONTENT queue messages

8. AI Model Router

Purpose: Select the appropriate AI model for each task based on tier, topic, and budget.

Tiers:

Tier	Use Case	Default Model
Default	Routine extraction, classification	gpt-5.4-nano
Mid	Accuracy-critical tasks	gpt-5.4-mini
High	Complex validation, dispute resolution	gemini-2.5-pro

Features:

27 models across 6 providers (OpenAI, Anthropic, Google, MiniMax, xAI)
Topic-based overrides (e.g., sports topics → gemini-3-flash-preview)
Daily spend caps per provider (ANTHROPIC_DAILY_SPEND_CAP_USD)
Per-call cost tracking via ai_cost_tracking table
Configurable via ai_model_tier_config DB table

Cannot:

Exceed daily spend caps (hard enforcement)
Escalate to expensive models without explicit opt-in

9. Observability Tool

Purpose: Structured logging, error tracking, and pipeline monitoring.

Components:

Component	Description
Structured JSON logging	`@eko/observability` — component-scoped loggers
Error tracking	Sentry integration across all apps and workers
AI cost tracking	`ai_cost_tracking` table with daily aggregation
Pipeline monitoring	`ingestion_runs` table tracking each cron invocation
Content operations	`content_operations_log` for pipeline debugging

Explicit Non-Tools

The following are not tools in Eko V2:

Site crawlers or web scrapers
Real-time streaming or WebSocket listeners
User-generated content ingestion
Social media scrapers
Image generation (AI or otherwise)
Predictive or anticipatory modeling

Requests requiring these must trigger an ADR revision.

Change Policy

New tools require: defined purpose, clear constraints, ADR alignment
Deprecated tools must be explicitly marked
All AI tools must include cost tracking

Summary

Eko's tools process news → facts → challenges through a pipeline of:

Ingest — Fetch and cluster articles
Extract — Schema-constrained AI extraction
Validate — 4-phase verification with 20+ evidence APIs
Generate — Challenge content for interactive learning
Observe — Cost tracking, logging, monitoring

Each tool is intentionally scoped and cost-bounded.

#Tools Reference

#Purpose

#Tooling Principles

#Tool Classes

#1. News Ingestion Tool

#2. Story Clustering Tool

#3. AI Fact Extraction Tool

#4. Fact Validation Tool

#5. Evidence API Tool

#6. Image Resolution Tool

#7. Challenge Generation Tool

#8. AI Model Router

#9. Observability Tool

#Explicit Non-Tools

#Change Policy

#Summary

Tools Reference

Purpose

Tooling Principles

Tool Classes

1. News Ingestion Tool

2. Story Clustering Tool

3. AI Fact Extraction Tool

4. Fact Validation Tool

5. Evidence API Tool

6. Image Resolution Tool

7. Challenge Generation Tool

8. AI Model Router

9. Observability Tool

Explicit Non-Tools

Change Policy

Summary