Tools Reference

Scope: Runtime capabilities & constraints for the V2 fact engine Audience: Engineers, AI agents Relationship: Constrained by Architecture Decisions; implemented via STACK.md


Purpose

This document defines the tools Eko has access to at runtime and the hard limits of those tools.

It exists to prevent:

  • Assumed capabilities
  • Tool-based hallucination
  • Silent scope creep

Tooling Principles

All tools must:

  • Produce schema-validated output — facts conform to fact_record_schemas.fact_keys (ADR-015)
  • Operate within cost bounds — daily spend caps, per-call tracking (ADR-016)
  • Flow through multi-pipeline convergence — all fact sources produce fact_records (ADR-014)
  • Ensure verification before publication — no fact reaches users without validation

If a capability violates an existing ADR, it is not a valid tool.


Tool Classes

1. News Ingestion Tool

Purpose: Fetch articles from external news APIs and deduplicate against existing records.

Can:

  • Fetch from Newsdata.io and Event Registry (active providers)
  • Deduplicate articles by URL and content hash against news_sources
  • Resolve hero images from article metadata

Cannot:

  • Crawl websites or follow links
  • Access paywalled content
  • Fetch from deprecated providers (NewsAPI, GNews, TheNewsAPI)

Invocation: Scheduled via INGEST_NEWS queue messages dispatched by cron


2. Story Clustering Tool

Purpose: Group related news articles into story clusters for batch extraction.

Can:

  • Cluster articles using TF-IDF cosine similarity
  • Merge clusters when similarity exceeds threshold
  • Batch unclustered sources older than 1 hour

Cannot:

  • Create cross-topic clusters
  • Modify article content

Invocation: CLUSTER_STORIES queue messages dispatched by cron/cluster-sweep


3. AI Fact Extraction Tool

Purpose: Extract structured facts from story clusters using schema-constrained AI.

Can:

  • Extract key-value facts constrained by per-topic fact_record_schemas
  • Score notability (0.0-1.0) with minimum threshold of 0.6
  • Assign topic categories from the taxonomy tree
  • Generate evergreen (timeless) facts from topic prompts
  • Explode seed entries into many structured facts with spinoff discovery

Cannot:

  • Extract facts without a matching schema definition
  • Reproduce full article content (non-substitutive)
  • Bypass notability threshold

Invocation: EXTRACT_FACTS, GENERATE_EVERGREEN, EXPLODE_CATEGORY_ENTRY queue messages

Constraints: Schema-driven (ADR-015), cost-bounded (ADR-016)


4. Fact Validation Tool

Purpose: Verify extracted facts through a 4-phase pipeline before publication.

Phases:

PhaseMethodDescription
1. StructuralRule-basedSchema conformance, required fields, format checks
2. ConsistencyRule-basedCross-field logic, temporal plausibility, value ranges
3. Cross-modelAI (Gemini 2.5 Flash)Independent AI verification against a different model
4. EvidenceAPIs + AICorroboration via external authoritative sources

Can:

  • Mark facts as validated, rejected, or needs_review
  • Record validation evidence and confidence scores per phase
  • Retry stuck validations (>4h without update)

Cannot:

  • Modify fact content (only status and validation metadata)
  • Skip phases

Invocation: VALIDATE_FACT queue messages (automatic post-extraction + retry cron)


5. Evidence API Tool

Purpose: Query external authoritative sources for fact corroboration during validation Phase 4.

Available APIs (20+):

APIDomainClient File
WikipediaGeneral knowledgewikipedia-client.ts
WikidataStructured datawikidata-client.ts
TheSportsDBSports factssportsdb-client.ts
MusicBrainzMusic metadatamusicbrainz-client.ts
NominatimGeographic datanominatim-client.ts
Open LibraryBook/author dataopenlibrary-client.ts
OMDbFilm/TV metadataomdb-client.ts
Met Museum (ARTIC)Art & exhibitsmetmuseum-client.ts
NASASpace/astronomynasa-client.ts
USDA FoodDataNutrition factsusda-client.ts
REST CountriesCountry datarestcountries-client.ts
IUCN Red ListEndangered speciesiucn-client.ts
RAWGVideo game datarawg-client.ts
OpenAlexAcademic papersopenalex-client.ts
Nobel PrizeNobel laureatesnobelprize-client.ts
Financial Modeling PrepFinancial datafmp-client.ts
FinnhubStock market datafinnhub-client.ts
FREDEconomic datafred-client.ts
World BankDevelopment indicatorsworldbank-client.ts
Open-MeteoWeather/climateopenmeteo-client.ts
TheMealDBRecipe/food datathemealdb-client.ts
Wikimedia EnterpriseHigh-quality Wikipediawikimedia-enterprise-client.ts

Cannot:

  • Access paid-only APIs without configured keys
  • Serve as primary fact source (evidence only)

All clients live in packages/ai/src/.


6. Image Resolution Tool

Purpose: Resolve and cache images for fact cards and challenges.

Can:

  • Search stock photo APIs (Unsplash, Pexels) for topic-relevant images
  • Cache resolved images via Cloudflare R2
  • Resolve anti-spoiler images for challenges where the fact image would reveal the answer

Cannot:

  • Generate images
  • Use copyrighted/non-stock images

Invocation: RESOLVE_IMAGE and RESOLVE_CHALLENGE_IMAGE queue messages


7. Challenge Generation Tool

Purpose: Pre-compute interactive quiz content for validated facts.

Can:

  • Generate challenge content across 6 styles (multiple_choice, true_false, fill_blank, free_text, timeline, ranking)
  • Scale difficulty from C1 (easiest, 1 point) to C5 (hardest, 5 points)
  • Generate challenge titles for display
  • Map facts to 8 named challenge formats (big_fan_of, know_a_lot_about, etc.)

Cannot:

  • Generate challenges for unvalidated facts
  • Reveal answers in challenge metadata or URLs

Invocation: GENERATE_CHALLENGE_CONTENT queue messages


8. AI Model Router

Purpose: Select the appropriate AI model for each task based on tier, topic, and budget.

Tiers:

TierUse CaseDefault Model
DefaultRoutine extraction, classificationgpt-5.4-nano
MidAccuracy-critical tasksgpt-5.4-mini
HighComplex validation, dispute resolutiongemini-2.5-pro

Features:

  • 27 models across 6 providers (OpenAI, Anthropic, Google, MiniMax, xAI)
  • Topic-based overrides (e.g., sports topics → gemini-3-flash-preview)
  • Daily spend caps per provider (ANTHROPIC_DAILY_SPEND_CAP_USD)
  • Per-call cost tracking via ai_cost_tracking table
  • Configurable via ai_model_tier_config DB table

Cannot:

  • Exceed daily spend caps (hard enforcement)
  • Escalate to expensive models without explicit opt-in

9. Observability Tool

Purpose: Structured logging, error tracking, and pipeline monitoring.

Components:

ComponentDescription
Structured JSON logging@eko/observability — component-scoped loggers
Error trackingSentry integration across all apps and workers
AI cost trackingai_cost_tracking table with daily aggregation
Pipeline monitoringingestion_runs table tracking each cron invocation
Content operationscontent_operations_log for pipeline debugging

Explicit Non-Tools

The following are not tools in Eko V2:

  • Site crawlers or web scrapers
  • Real-time streaming or WebSocket listeners
  • User-generated content ingestion
  • Social media scrapers
  • Image generation (AI or otherwise)
  • Predictive or anticipatory modeling

Requests requiring these must trigger an ADR revision.


Change Policy

  • New tools require: defined purpose, clear constraints, ADR alignment
  • Deprecated tools must be explicitly marked
  • All AI tools must include cost tracking

Summary

Eko's tools process news → facts → challenges through a pipeline of:

  1. Ingest — Fetch and cluster articles
  2. Extract — Schema-constrained AI extraction
  3. Validate — 4-phase verification with 20+ evidence APIs
  4. Generate — Challenge content for interactive learning
  5. Observe — Cost tracking, logging, monitoring

Each tool is intentionally scoped and cost-bounded.