Tools Reference
Scope: Runtime capabilities & constraints for the V2 fact engine
Audience: Engineers, AI agents
Relationship: Constrained by Architecture Decisions; implemented via STACK.md
Purpose
This document defines the tools Eko has access to at runtime and the hard limits of those tools.
It exists to prevent:
- Assumed capabilities
- Tool-based hallucination
- Silent scope creep
Tooling Principles
All tools must:
- Produce schema-validated output — facts conform to
fact_record_schemas.fact_keys(ADR-015) - Operate within cost bounds — daily spend caps, per-call tracking (ADR-016)
- Flow through multi-pipeline convergence — all fact sources produce
fact_records(ADR-014) - Ensure verification before publication — no fact reaches users without validation
If a capability violates an existing ADR, it is not a valid tool.
Tool Classes
1. News Ingestion Tool
Purpose: Fetch articles from external news APIs and deduplicate against existing records.
Can:
- Fetch from Newsdata.io and Event Registry (active providers)
- Deduplicate articles by URL and content hash against
news_sources - Resolve hero images from article metadata
Cannot:
- Crawl websites or follow links
- Access paywalled content
- Fetch from deprecated providers (NewsAPI, GNews, TheNewsAPI)
Invocation: Scheduled via INGEST_NEWS queue messages dispatched by cron
2. Story Clustering Tool
Purpose: Group related news articles into story clusters for batch extraction.
Can:
- Cluster articles using TF-IDF cosine similarity
- Merge clusters when similarity exceeds threshold
- Batch unclustered sources older than 1 hour
Cannot:
- Create cross-topic clusters
- Modify article content
Invocation: CLUSTER_STORIES queue messages dispatched by cron/cluster-sweep
3. AI Fact Extraction Tool
Purpose: Extract structured facts from story clusters using schema-constrained AI.
Can:
- Extract key-value facts constrained by per-topic
fact_record_schemas - Score notability (0.0-1.0) with minimum threshold of 0.6
- Assign topic categories from the taxonomy tree
- Generate evergreen (timeless) facts from topic prompts
- Explode seed entries into many structured facts with spinoff discovery
Cannot:
- Extract facts without a matching schema definition
- Reproduce full article content (non-substitutive)
- Bypass notability threshold
Invocation: EXTRACT_FACTS, GENERATE_EVERGREEN, EXPLODE_CATEGORY_ENTRY queue messages
Constraints: Schema-driven (ADR-015), cost-bounded (ADR-016)
4. Fact Validation Tool
Purpose: Verify extracted facts through a 4-phase pipeline before publication.
Phases:
| Phase | Method | Description |
|---|---|---|
| 1. Structural | Rule-based | Schema conformance, required fields, format checks |
| 2. Consistency | Rule-based | Cross-field logic, temporal plausibility, value ranges |
| 3. Cross-model | AI (Gemini 2.5 Flash) | Independent AI verification against a different model |
| 4. Evidence | APIs + AI | Corroboration via external authoritative sources |
Can:
- Mark facts as
validated,rejected, orneeds_review - Record validation evidence and confidence scores per phase
- Retry stuck validations (>4h without update)
Cannot:
- Modify fact content (only status and validation metadata)
- Skip phases
Invocation: VALIDATE_FACT queue messages (automatic post-extraction + retry cron)
5. Evidence API Tool
Purpose: Query external authoritative sources for fact corroboration during validation Phase 4.
Available APIs (20+):
| API | Domain | Client File |
|---|---|---|
| Wikipedia | General knowledge | wikipedia-client.ts |
| Wikidata | Structured data | wikidata-client.ts |
| TheSportsDB | Sports facts | sportsdb-client.ts |
| MusicBrainz | Music metadata | musicbrainz-client.ts |
| Nominatim | Geographic data | nominatim-client.ts |
| Open Library | Book/author data | openlibrary-client.ts |
| OMDb | Film/TV metadata | omdb-client.ts |
| Met Museum (ARTIC) | Art & exhibits | metmuseum-client.ts |
| NASA | Space/astronomy | nasa-client.ts |
| USDA FoodData | Nutrition facts | usda-client.ts |
| REST Countries | Country data | restcountries-client.ts |
| IUCN Red List | Endangered species | iucn-client.ts |
| RAWG | Video game data | rawg-client.ts |
| OpenAlex | Academic papers | openalex-client.ts |
| Nobel Prize | Nobel laureates | nobelprize-client.ts |
| Financial Modeling Prep | Financial data | fmp-client.ts |
| Finnhub | Stock market data | finnhub-client.ts |
| FRED | Economic data | fred-client.ts |
| World Bank | Development indicators | worldbank-client.ts |
| Open-Meteo | Weather/climate | openmeteo-client.ts |
| TheMealDB | Recipe/food data | themealdb-client.ts |
| Wikimedia Enterprise | High-quality Wikipedia | wikimedia-enterprise-client.ts |
Cannot:
- Access paid-only APIs without configured keys
- Serve as primary fact source (evidence only)
All clients live in packages/ai/src/.
6. Image Resolution Tool
Purpose: Resolve and cache images for fact cards and challenges.
Can:
- Search stock photo APIs (Unsplash, Pexels) for topic-relevant images
- Cache resolved images via Cloudflare R2
- Resolve anti-spoiler images for challenges where the fact image would reveal the answer
Cannot:
- Generate images
- Use copyrighted/non-stock images
Invocation: RESOLVE_IMAGE and RESOLVE_CHALLENGE_IMAGE queue messages
7. Challenge Generation Tool
Purpose: Pre-compute interactive quiz content for validated facts.
Can:
- Generate challenge content across 6 styles (multiple_choice, true_false, fill_blank, free_text, timeline, ranking)
- Scale difficulty from C1 (easiest, 1 point) to C5 (hardest, 5 points)
- Generate challenge titles for display
- Map facts to 8 named challenge formats (big_fan_of, know_a_lot_about, etc.)
Cannot:
- Generate challenges for unvalidated facts
- Reveal answers in challenge metadata or URLs
Invocation: GENERATE_CHALLENGE_CONTENT queue messages
8. AI Model Router
Purpose: Select the appropriate AI model for each task based on tier, topic, and budget.
Tiers:
| Tier | Use Case | Default Model |
|---|---|---|
| Default | Routine extraction, classification | gpt-5.4-nano |
| Mid | Accuracy-critical tasks | gpt-5.4-mini |
| High | Complex validation, dispute resolution | gemini-2.5-pro |
Features:
- 27 models across 6 providers (OpenAI, Anthropic, Google, MiniMax, xAI)
- Topic-based overrides (e.g., sports topics → gemini-3-flash-preview)
- Daily spend caps per provider (
ANTHROPIC_DAILY_SPEND_CAP_USD) - Per-call cost tracking via
ai_cost_trackingtable - Configurable via
ai_model_tier_configDB table
Cannot:
- Exceed daily spend caps (hard enforcement)
- Escalate to expensive models without explicit opt-in
9. Observability Tool
Purpose: Structured logging, error tracking, and pipeline monitoring.
Components:
| Component | Description |
|---|---|
| Structured JSON logging | @eko/observability — component-scoped loggers |
| Error tracking | Sentry integration across all apps and workers |
| AI cost tracking | ai_cost_tracking table with daily aggregation |
| Pipeline monitoring | ingestion_runs table tracking each cron invocation |
| Content operations | content_operations_log for pipeline debugging |
Explicit Non-Tools
The following are not tools in Eko V2:
- Site crawlers or web scrapers
- Real-time streaming or WebSocket listeners
- User-generated content ingestion
- Social media scrapers
- Image generation (AI or otherwise)
- Predictive or anticipatory modeling
Requests requiring these must trigger an ADR revision.
Change Policy
- New tools require: defined purpose, clear constraints, ADR alignment
- Deprecated tools must be explicitly marked
- All AI tools must include cost tracking
Summary
Eko's tools process news → facts → challenges through a pipeline of:
- Ingest — Fetch and cluster articles
- Extract — Schema-constrained AI extraction
- Validate — 4-phase verification with 20+ evidence APIs
- Generate — Challenge content for interactive learning
- Observe — Cost tracking, logging, monitoring
Each tool is intentionally scoped and cost-bounded.