OpenAlex for Academic & Scientific Fact Evidence
Motivation
Science, technology, medicine, and history facts frequently claim specific discoveries, publication dates, institutional affiliations, and researcher attribution. The evidence pipeline currently verifies these against Wikipedia text summaries and sparse Wikidata triples — neither provides structured scholarly data.
OpenAlex is an open knowledge graph of 286M+ scholarly works, tens of millions of authors, and 110K+ institutions. It provides structured, queryable data for the exact claims our fact engine generates: who discovered what, when it was published, where researchers worked, and how significant the work was (citation counts).
Service Overview
| Entity | Endpoint | Count | Key Fields |
|---|---|---|---|
| Works | /works | 286M+ | title, publication_date, authors, topics, cited_by_count, doi |
| Authors | /authors | tens of millions | display_name, works_count, cited_by_count, affiliations, orcid |
| Institutions | /institutions | 110K+ | display_name, type, country, founded_year, works_count |
| Sources | /sources | 250K+ | display_name, issn, type, works_count |
| Topics | /topics | ~4,500 | display_name, domain, field, subfield (4-level hierarchy) |
| Funders | /funders | 35K+ | display_name, country, grants_count |
Base URL: https://api.openalex.org
Auth: API key optional (free, add as ?api_key=KEY for higher rate limits). Works without key.
Rate limits: Generous free tier. Key recommended for production use.
Format: JSON, 25 results/page default, 100 max.
What It Verifies
| Fact Claim Type | Example | OpenAlex Query | Evidence Returned |
|---|---|---|---|
| Discovery attribution | "Curie discovered radium" | /authors?search=Marie+Curie | works list, institution, topics |
| Publication date | "CRISPR first used in 2012" | /works?search=CRISPR&sort=publication_date | actual first paper + date |
| Researcher output | "Einstein published 300+ papers" | /authors?search=Albert+Einstein | exact works_count |
| Institution founding | "Harvard founded 1636" | /institutions?search=Harvard | founded_year, type, country |
| Scientific impact | "Most cited physics paper" | /works?filter=topics.id:T...&sort=cited_by_count:desc | citation ranking |
| Affiliation | "Hawking worked at Cambridge" | /authors?search=Stephen+Hawking | affiliations array |
Implementation
Challenge 1: OpenAlex Client
File: packages/ai/src/openalex-client.ts (new)
- Base URL:
https://api.openalex.org - Optional API key via
OPENALEX_API_KEYenv var (add to config, not required) - Polite header:
User-Agent: Eko/1.0 (mailto:team@eko.day)(OpenAlex requests this) - In-memory cache: 24h TTL, 5K max entries
- 10s timeout, abort controller
- Metrics:
openalex.api_calls,openalex.cache_hit,openalex.entity_found
Key methods:
searchAuthor(name: string): Promise<OpenAlexAuthor | null>
getAuthorWorks(authorId: string, limit?: number): Promise<OpenAlexWork[]>
searchWork(title: string): Promise<OpenAlexWork | null>
searchInstitution(name: string): Promise<OpenAlexInstitution | null>
searchTopic(query: string): Promise<OpenAlexTopic | null>
formatOpenAlexContext(entity: OpenAlexAuthor | OpenAlexInstitution | OpenAlexWork): string
Acceptance: Can look up "Marie Curie" → author with works_count, cited_by_count, affiliations. Can look up "On the Origin of Species" → work with publication_date, author, topics.
Challenge 2: Evidence Pipeline Integration
File: packages/ai/src/validation/evidence.ts
Wire into Phase 4b as a science/academia enrichment source. Topic-gated to relevant domains:
// Academic enrichment — science, technology, medicine, history
const academicTopics = ['science', 'technology', 'medicine', 'health', 'history']
if (academicTopics.some(t => topicPath.includes(t))) {
const author = await searchAuthor(entityName)
if (author) {
findings.push(`OpenAlex: ${author.display_name}, ${author.works_count} works, ${author.cited_by_count} citations`)
// If fact mentions a specific work/discovery, look it up
if (hasPublicationClaim(factTitle, factContext)) {
const work = await searchWork(extractWorkTitle(factContext))
if (work) {
findings.push(`OpenAlex work: "${work.title}" (${work.publication_date}), ${work.cited_by_count} citations`)
}
}
}
}
Also check institution claims regardless of topic:
if (hasInstitutionClaim(factContext)) {
const inst = await searchInstitution(extractInstitutionName(factContext))
if (inst) {
findings.push(`OpenAlex institution: ${inst.display_name}, founded ${inst.founded_year}, ${inst.country}`)
}
}
Acceptance: Evidence pipeline includes OpenAlex data in reasoner prompt for science/tech/medicine/history facts. Einstein facts get verified with actual works_count and institutional affiliation.
Challenge 3: Nobel Prize API Client
File: packages/ai/src/nobelprize-client.ts (new)
The Nobel Prize API provides authoritative data for prize attribution, motivation, year, and affiliation — a common source of LLM hallucination (e.g., attributing the wrong discovery to the right laureate).
Base URL: https://api.nobelprize.org/2.1
Auth: None required. Rate limits: None documented. Cost: Free.
Endpoints:
GET /laureates?name={name}— laureate profile with all prizesGET /nobelPrizes?yearTo={year}&category={cat}— prizes by year/category
Response fields: awardYear, dateAwarded, category, motivation (specific discovery cited), affiliations (institution at time of award), portion (shared vs solo), prizeAmount, prizeAmountAdjusted, birth/death dates and places, Wikipedia/Wikidata cross-links.
Key methods:
searchLaureate(name: string): Promise<NobelLaureate | null>
searchPrize(category: string, year: number): Promise<NobelPrize | null>
formatNobelContext(laureate: NobelLaureate): string
- In-memory cache: 24h TTL, 1K max entries (small dataset — ~1,000 laureates total)
- Metrics:
nobelprize.api_calls,nobelprize.cache_hit,nobelprize.laureate_found
What it catches:
- Wrong discovery attributed: "Curie won for discovering radioactivity" vs actual motivation: "radiation phenomena"
- Wrong year: Einstein won in 1921 but received it in 1922 —
awardYearvsdateAwarded - Wrong category: "Einstein won the Nobel Prize in Mathematics" → no Mathematics category exists
- Shared vs solo: "Curie was the sole winner" vs
portion: "1/2"
Acceptance: Can look up "Marie Curie" → returns both prizes (Physics 1903, Chemistry 1911) with motivations, co-laureates, affiliations.
Challenge 4: Wire Nobel Prize Into Evidence Pipeline
File: packages/ai/src/validation/evidence.ts
Nobel lookup triggers when fact context mentions Nobel-related terms:
const nobelTerms = ['nobel', 'nobel prize', 'laureate', 'nobel memorial']
if (nobelTerms.some(t => factContext.toLowerCase().includes(t))) {
const laureate = await searchLaureate(entityName)
if (laureate) {
for (const prize of laureate.nobelPrizes) {
findings.push(`Nobel API: ${prize.categoryFullName} ${prize.awardYear} — "${prize.motivation}"`)
findings.push(`Nobel API: affiliation at award: ${prize.affiliations?.[0]?.name}`)
}
}
}
Confidence impact:
- Nobel laureate found with matching category + year → boost
apiConfidenceby 0.15 (authoritative) - Nobel motivation contradicts fact claim → flag as critical (authoritative contradiction)
- No Nobel results → fall through (no penalty)
Acceptance: "Einstein won Nobel Prize for photoelectric effect in 1921" gets verified against actual API data. "Einstein won Nobel Prize for relativity" gets flagged — motivation says "photoelectric effect."
Challenge 5: NASA API Client
File: packages/ai/src/nasa-client.ts (new)
Authoritative source for space and astronomy facts — exoplanet counts, asteroid data, Mars rover data, solar weather events.
Base URL: https://api.nasa.gov
Auth: ?api_key=KEY (NASA_API_KEY env var, already in .env.local)
Free tier: 1,000 requests/hour with personal key
Cost: Free
Key APIs for fact verification:
| API | Endpoint | Verifies |
|---|---|---|
| Exoplanet Archive | exoplanetarchive.ipac.caltech.edu/TAP | Exoplanet counts, discovery dates, star data |
| Asteroids NeoWs | /neo/rest/v1/feed | Asteroid close approaches, sizes, dates |
| Mars Rover Photos | /mars-photos/api/v1/rovers/{rover} | Rover landing dates, photo counts, mission status |
| DONKI | /DONKI/CME, /DONKI/GST | Solar storm events, geomagnetic storms by date |
Key methods:
// Exoplanets
getExoplanetCount(): Promise<number>
searchExoplanet(name: string): Promise<NasaExoplanet | null>
// Asteroids
getAsteroidApproach(startDate: string, endDate: string): Promise<NasaAsteroid[]>
searchAsteroid(name: string): Promise<NasaAsteroid | null>
// Mars rovers
getRoverInfo(rover: 'curiosity' | 'opportunity' | 'spirit' | 'perseverance'): Promise<NasaRover | null>
// Solar weather
getSolarEvents(startDate: string, endDate: string): Promise<NasaSolarEvent[]>
formatNasaContext(data: NasaExoplanet | NasaRover | NasaAsteroid): string
- In-memory cache: 24h TTL, 1K max entries (low volume — space facts are ~1-2% of total)
- Metrics:
nasa.api_calls,nasa.cache_hit,nasa.entity_found
Acceptance: Can query exoplanet count → actual number. Can look up "Curiosity rover" → landing date, photo count, mission status.
Challenge 6: NASA Evidence Integration
File: packages/ai/src/validation/evidence.ts
Topic-gated to space/astronomy:
const spaceTopics = ['space', 'astronomy', 'aerospace', 'planetary']
if (spaceTopics.some(t => topicPath.includes(t)) && nasaApiKey) {
// Exoplanet claims
if (hasExoplanetClaim(factContext)) {
const count = await getExoplanetCount()
findings.push(`NASA Exoplanet Archive: ${count} confirmed exoplanets`)
}
// Mars rover claims
if (hasRoverClaim(factContext)) {
const rover = detectRover(factContext) // 'curiosity' | 'perseverance' etc.
if (rover) {
const info = await getRoverInfo(rover)
if (info) findings.push(`NASA: ${info.name} landed ${info.landing_date}, ${info.total_photos} photos, status: ${info.status}`)
}
}
// Asteroid close approach claims
if (hasAsteroidClaim(factContext)) {
const year = extractYear(factContext)
const asteroids = await getAsteroidApproach(`${year}-01-01`, `${year}-12-31`)
if (asteroids.length) findings.push(`NASA NeoWs: ${asteroids.length} close approaches in ${year}`)
}
}
Confidence:
- NASA data matches claim →
apiConfidenceboost 0.15 (authoritative primary source) - NASA data contradicts claim → flag as critical
- No NASA results → fall through (no penalty)
Acceptance: "Over 5,000 exoplanets confirmed" → NASA Exoplanet Archive → actual count → verified or refuted.
Challenge 7: Tests
Files:
packages/ai/src/__tests__/openalex-client.test.ts(new)packages/ai/src/__tests__/nobelprize-client.test.ts(new)packages/ai/src/__tests__/nasa-client.test.ts(new)
OpenAlex tests:
- Response parsing for authors, works, institutions
- Cache behavior tests
formatOpenAlexContextoutput tests- Topic-gating logic tests
- Graceful failure when entity not found
Nobel Prize tests:
- Response parsing for laureates with multiple prizes (e.g., Curie)
awardYearvsdateAwardeddistinctionformatNobelContextoutput- Nobel term detection in fact context
- Cache behavior
NASA tests:
- Exoplanet count and search parsing
- Rover info response parsing
- Asteroid approach data parsing
- Space topic detection in fact context
- Cache behavior
Acceptance: bun run test passes.
Evidence Confidence Impact
- OpenAlex author found with matching works/affiliations → boost
apiConfidenceby 0.1 - OpenAlex work found with matching publication date → boost
apiConfidenceby 0.1 - OpenAlex institution found with matching founded_year → boost
apiConfidenceby 0.05 - Nobel laureate found with matching category + year → boost
apiConfidenceby 0.15 - Nobel motivation contradicts fact claim → flag as critical
- NASA data matches space/astronomy claim → boost
apiConfidenceby 0.15 - NASA data contradicts claim → flag as critical
- No results from any source → fall through to existing sources (no penalty)
Cost
All three APIs free. NASA key already in .env.local. OpenAlex key optional.
Dependencies
NASA_API_KEYin.env.local(already done)- Optional
OPENALEX_API_KEYenv var - Add both to
packages/config/src/index.tsenv schema and.env.example User-Agentheader with contact email for OpenAlex (community norm, not enforced)
Relationship to Other Evidence Plans
| Plan | Domain | Data |
|---|---|---|
| API-Sports | Sports | Match results, player stats, game data |
| OpenAlex + Nobel Prize + NASA | Science, tech, medicine, academia, space | Authors, papers, institutions, prize attribution, exoplanets, rovers, asteroids |
| Alpha Vantage | Finance (primary) | Company fundamentals, stock prices, basic economic data |
| FRED + Finnhub + FMP | Finance (expansion) | Authoritative economic data, ESG, congressional trading, capacity backup |
| DBpedia | General-purpose fallback | Structured Wikipedia infobox properties |