OpenAlex for Academic & Scientific Fact Evidence

Motivation

Science, technology, medicine, and history facts frequently claim specific discoveries, publication dates, institutional affiliations, and researcher attribution. The evidence pipeline currently verifies these against Wikipedia text summaries and sparse Wikidata triples — neither provides structured scholarly data.

OpenAlex is an open knowledge graph of 286M+ scholarly works, tens of millions of authors, and 110K+ institutions. It provides structured, queryable data for the exact claims our fact engine generates: who discovered what, when it was published, where researchers worked, and how significant the work was (citation counts).

Service Overview

EntityEndpointCountKey Fields
Works/works286M+title, publication_date, authors, topics, cited_by_count, doi
Authors/authorstens of millionsdisplay_name, works_count, cited_by_count, affiliations, orcid
Institutions/institutions110K+display_name, type, country, founded_year, works_count
Sources/sources250K+display_name, issn, type, works_count
Topics/topics~4,500display_name, domain, field, subfield (4-level hierarchy)
Funders/funders35K+display_name, country, grants_count

Base URL: https://api.openalex.org Auth: API key optional (free, add as ?api_key=KEY for higher rate limits). Works without key. Rate limits: Generous free tier. Key recommended for production use. Format: JSON, 25 results/page default, 100 max.

What It Verifies

Fact Claim TypeExampleOpenAlex QueryEvidence Returned
Discovery attribution"Curie discovered radium"/authors?search=Marie+Curieworks list, institution, topics
Publication date"CRISPR first used in 2012"/works?search=CRISPR&sort=publication_dateactual first paper + date
Researcher output"Einstein published 300+ papers"/authors?search=Albert+Einsteinexact works_count
Institution founding"Harvard founded 1636"/institutions?search=Harvardfounded_year, type, country
Scientific impact"Most cited physics paper"/works?filter=topics.id:T...&sort=cited_by_count:desccitation ranking
Affiliation"Hawking worked at Cambridge"/authors?search=Stephen+Hawkingaffiliations array

Implementation

Challenge 1: OpenAlex Client

File: packages/ai/src/openalex-client.ts (new)

  • Base URL: https://api.openalex.org
  • Optional API key via OPENALEX_API_KEY env var (add to config, not required)
  • Polite header: User-Agent: Eko/1.0 (mailto:team@eko.day) (OpenAlex requests this)
  • In-memory cache: 24h TTL, 5K max entries
  • 10s timeout, abort controller
  • Metrics: openalex.api_calls, openalex.cache_hit, openalex.entity_found

Key methods:

searchAuthor(name: string): Promise<OpenAlexAuthor | null>
getAuthorWorks(authorId: string, limit?: number): Promise<OpenAlexWork[]>
searchWork(title: string): Promise<OpenAlexWork | null>
searchInstitution(name: string): Promise<OpenAlexInstitution | null>
searchTopic(query: string): Promise<OpenAlexTopic | null>
formatOpenAlexContext(entity: OpenAlexAuthor | OpenAlexInstitution | OpenAlexWork): string

Acceptance: Can look up "Marie Curie" → author with works_count, cited_by_count, affiliations. Can look up "On the Origin of Species" → work with publication_date, author, topics.

Challenge 2: Evidence Pipeline Integration

File: packages/ai/src/validation/evidence.ts

Wire into Phase 4b as a science/academia enrichment source. Topic-gated to relevant domains:

// Academic enrichment — science, technology, medicine, history
const academicTopics = ['science', 'technology', 'medicine', 'health', 'history']
if (academicTopics.some(t => topicPath.includes(t))) {
  const author = await searchAuthor(entityName)
  if (author) {
    findings.push(`OpenAlex: ${author.display_name}, ${author.works_count} works, ${author.cited_by_count} citations`)
    // If fact mentions a specific work/discovery, look it up
    if (hasPublicationClaim(factTitle, factContext)) {
      const work = await searchWork(extractWorkTitle(factContext))
      if (work) {
        findings.push(`OpenAlex work: "${work.title}" (${work.publication_date}), ${work.cited_by_count} citations`)
      }
    }
  }
}

Also check institution claims regardless of topic:

if (hasInstitutionClaim(factContext)) {
  const inst = await searchInstitution(extractInstitutionName(factContext))
  if (inst) {
    findings.push(`OpenAlex institution: ${inst.display_name}, founded ${inst.founded_year}, ${inst.country}`)
  }
}

Acceptance: Evidence pipeline includes OpenAlex data in reasoner prompt for science/tech/medicine/history facts. Einstein facts get verified with actual works_count and institutional affiliation.

Challenge 3: Nobel Prize API Client

File: packages/ai/src/nobelprize-client.ts (new)

The Nobel Prize API provides authoritative data for prize attribution, motivation, year, and affiliation — a common source of LLM hallucination (e.g., attributing the wrong discovery to the right laureate).

Base URL: https://api.nobelprize.org/2.1 Auth: None required. Rate limits: None documented. Cost: Free.

Endpoints:

  • GET /laureates?name={name} — laureate profile with all prizes
  • GET /nobelPrizes?yearTo={year}&category={cat} — prizes by year/category

Response fields: awardYear, dateAwarded, category, motivation (specific discovery cited), affiliations (institution at time of award), portion (shared vs solo), prizeAmount, prizeAmountAdjusted, birth/death dates and places, Wikipedia/Wikidata cross-links.

Key methods:

searchLaureate(name: string): Promise<NobelLaureate | null>
searchPrize(category: string, year: number): Promise<NobelPrize | null>
formatNobelContext(laureate: NobelLaureate): string
  • In-memory cache: 24h TTL, 1K max entries (small dataset — ~1,000 laureates total)
  • Metrics: nobelprize.api_calls, nobelprize.cache_hit, nobelprize.laureate_found

What it catches:

  • Wrong discovery attributed: "Curie won for discovering radioactivity" vs actual motivation: "radiation phenomena"
  • Wrong year: Einstein won in 1921 but received it in 1922 — awardYear vs dateAwarded
  • Wrong category: "Einstein won the Nobel Prize in Mathematics" → no Mathematics category exists
  • Shared vs solo: "Curie was the sole winner" vs portion: "1/2"

Acceptance: Can look up "Marie Curie" → returns both prizes (Physics 1903, Chemistry 1911) with motivations, co-laureates, affiliations.

Challenge 4: Wire Nobel Prize Into Evidence Pipeline

File: packages/ai/src/validation/evidence.ts

Nobel lookup triggers when fact context mentions Nobel-related terms:

const nobelTerms = ['nobel', 'nobel prize', 'laureate', 'nobel memorial']
if (nobelTerms.some(t => factContext.toLowerCase().includes(t))) {
  const laureate = await searchLaureate(entityName)
  if (laureate) {
    for (const prize of laureate.nobelPrizes) {
      findings.push(`Nobel API: ${prize.categoryFullName} ${prize.awardYear} — "${prize.motivation}"`)
      findings.push(`Nobel API: affiliation at award: ${prize.affiliations?.[0]?.name}`)
    }
  }
}

Confidence impact:

  • Nobel laureate found with matching category + year → boost apiConfidence by 0.15 (authoritative)
  • Nobel motivation contradicts fact claim → flag as critical (authoritative contradiction)
  • No Nobel results → fall through (no penalty)

Acceptance: "Einstein won Nobel Prize for photoelectric effect in 1921" gets verified against actual API data. "Einstein won Nobel Prize for relativity" gets flagged — motivation says "photoelectric effect."

Challenge 5: NASA API Client

File: packages/ai/src/nasa-client.ts (new)

Authoritative source for space and astronomy facts — exoplanet counts, asteroid data, Mars rover data, solar weather events.

Base URL: https://api.nasa.gov Auth: ?api_key=KEY (NASA_API_KEY env var, already in .env.local) Free tier: 1,000 requests/hour with personal key Cost: Free

Key APIs for fact verification:

APIEndpointVerifies
Exoplanet Archiveexoplanetarchive.ipac.caltech.edu/TAPExoplanet counts, discovery dates, star data
Asteroids NeoWs/neo/rest/v1/feedAsteroid close approaches, sizes, dates
Mars Rover Photos/mars-photos/api/v1/rovers/{rover}Rover landing dates, photo counts, mission status
DONKI/DONKI/CME, /DONKI/GSTSolar storm events, geomagnetic storms by date

Key methods:

// Exoplanets
getExoplanetCount(): Promise<number>
searchExoplanet(name: string): Promise<NasaExoplanet | null>

// Asteroids
getAsteroidApproach(startDate: string, endDate: string): Promise<NasaAsteroid[]>
searchAsteroid(name: string): Promise<NasaAsteroid | null>

// Mars rovers
getRoverInfo(rover: 'curiosity' | 'opportunity' | 'spirit' | 'perseverance'): Promise<NasaRover | null>

// Solar weather
getSolarEvents(startDate: string, endDate: string): Promise<NasaSolarEvent[]>

formatNasaContext(data: NasaExoplanet | NasaRover | NasaAsteroid): string
  • In-memory cache: 24h TTL, 1K max entries (low volume — space facts are ~1-2% of total)
  • Metrics: nasa.api_calls, nasa.cache_hit, nasa.entity_found

Acceptance: Can query exoplanet count → actual number. Can look up "Curiosity rover" → landing date, photo count, mission status.

Challenge 6: NASA Evidence Integration

File: packages/ai/src/validation/evidence.ts

Topic-gated to space/astronomy:

const spaceTopics = ['space', 'astronomy', 'aerospace', 'planetary']
if (spaceTopics.some(t => topicPath.includes(t)) && nasaApiKey) {
  // Exoplanet claims
  if (hasExoplanetClaim(factContext)) {
    const count = await getExoplanetCount()
    findings.push(`NASA Exoplanet Archive: ${count} confirmed exoplanets`)
  }
  // Mars rover claims
  if (hasRoverClaim(factContext)) {
    const rover = detectRover(factContext) // 'curiosity' | 'perseverance' etc.
    if (rover) {
      const info = await getRoverInfo(rover)
      if (info) findings.push(`NASA: ${info.name} landed ${info.landing_date}, ${info.total_photos} photos, status: ${info.status}`)
    }
  }
  // Asteroid close approach claims
  if (hasAsteroidClaim(factContext)) {
    const year = extractYear(factContext)
    const asteroids = await getAsteroidApproach(`${year}-01-01`, `${year}-12-31`)
    if (asteroids.length) findings.push(`NASA NeoWs: ${asteroids.length} close approaches in ${year}`)
  }
}

Confidence:

  • NASA data matches claim → apiConfidence boost 0.15 (authoritative primary source)
  • NASA data contradicts claim → flag as critical
  • No NASA results → fall through (no penalty)

Acceptance: "Over 5,000 exoplanets confirmed" → NASA Exoplanet Archive → actual count → verified or refuted.

Challenge 7: Tests

Files:

  • packages/ai/src/__tests__/openalex-client.test.ts (new)
  • packages/ai/src/__tests__/nobelprize-client.test.ts (new)
  • packages/ai/src/__tests__/nasa-client.test.ts (new)

OpenAlex tests:

  • Response parsing for authors, works, institutions
  • Cache behavior tests
  • formatOpenAlexContext output tests
  • Topic-gating logic tests
  • Graceful failure when entity not found

Nobel Prize tests:

  • Response parsing for laureates with multiple prizes (e.g., Curie)
  • awardYear vs dateAwarded distinction
  • formatNobelContext output
  • Nobel term detection in fact context
  • Cache behavior

NASA tests:

  • Exoplanet count and search parsing
  • Rover info response parsing
  • Asteroid approach data parsing
  • Space topic detection in fact context
  • Cache behavior

Acceptance: bun run test passes.

Evidence Confidence Impact

  • OpenAlex author found with matching works/affiliations → boost apiConfidence by 0.1
  • OpenAlex work found with matching publication date → boost apiConfidence by 0.1
  • OpenAlex institution found with matching founded_year → boost apiConfidence by 0.05
  • Nobel laureate found with matching category + year → boost apiConfidence by 0.15
  • Nobel motivation contradicts fact claim → flag as critical
  • NASA data matches space/astronomy claim → boost apiConfidence by 0.15
  • NASA data contradicts claim → flag as critical
  • No results from any source → fall through to existing sources (no penalty)

Cost

All three APIs free. NASA key already in .env.local. OpenAlex key optional.

Dependencies

  • NASA_API_KEY in .env.local (already done)
  • Optional OPENALEX_API_KEY env var
  • Add both to packages/config/src/index.ts env schema and .env.example
  • User-Agent header with contact email for OpenAlex (community norm, not enforced)

Relationship to Other Evidence Plans

PlanDomainData
API-SportsSportsMatch results, player stats, game data
OpenAlex + Nobel Prize + NASAScience, tech, medicine, academia, spaceAuthors, papers, institutions, prize attribution, exoplanets, rovers, asteroids
Alpha VantageFinance (primary)Company fundamentals, stock prices, basic economic data
FRED + Finnhub + FMPFinance (expansion)Authoritative economic data, ESG, congressional trading, capacity backup
DBpediaGeneral-purpose fallbackStructured Wikipedia infobox properties