Entity Linking Layer
How Eko connects entities in the seed pipeline to build a knowledge graph of relationships between people, places, organizations, and events.
What It Does
The entity linking layer adds many-to-many relationships between seed entries, replacing the one-directional parentEntryId tree with a bidirectional graph. It also connects fact records directly to their originating seed entries and resolves super fact links to concrete entity IDs.
Three problems this solves:
- Spinoff relationships were one-directional trees --
parentEntryIdonly tracks parent-child lineage for depth control. When a spinoff target already exists (upsert), the relationship was silently dropped. - Facts had no direct link to entries --
fact_records.external_source_idstored"seed:name:idx"as a string pattern, not a queryable FK. - Super fact links were name-only --
super_fact_links.linked_entry_namestored text but never resolved to an actual entry ID.
Data Model
seed_entry_queue seed_entry_queue
(Entity A) (Entity B)
| |
+--- seed_entry_links -----+
| entry_id_a < entry_id_b
| connection_type
| relationship_a_to_b
| relationship_b_to_a
| strength (0..1)
|
+--- fact_records
| seed_entry_id (FK)
|
+--- super_fact_links
linked_entry_id (FK)
seed_entry_links
The junction table for entity-to-entity relationships. Sits alongside parentEntryId (which continues to drive spinoff depth control).
| Column | Type | Purpose |
|---|---|---|
entry_id_a | uuid FK | First entity (always the lexicographically smaller UUID) |
entry_id_b | uuid FK | Second entity (always the larger UUID) |
connection_type | text | Relationship category |
relationship_a_to_b | text | A's role relative to B (e.g., "played for") |
relationship_b_to_a | text | B's role relative to A (e.g., "employed") |
discovered_by | text | How the link was found |
strength | real | Confidence/relevance score (0..1) |
Constraints:
entry_id_a < entry_id_b-- canonical ordering prevents duplicate pairsentry_id_a != entry_id_b-- no self-links- Unique on
(entry_id_a, entry_id_b, connection_type)-- one link per type per pair
Connection Types
| Type | Meaning | Example |
|---|---|---|
spinoff | Discovery lineage | "Michael Jordan" -> "Chicago Bulls" |
shared_event | Both participated in same event | Two athletes at the same Olympics |
rivalry | Competitive relationship | "Coca-Cola" <-> "Pepsi" |
collaboration | Worked together | Band members |
temporal | Same time period | Historical figures in same era |
geographic | Same location | Companies headquartered in same city |
causal | Cause-effect relationship | Invention and its consequences |
related | General connection (default) | Catch-all |
Discovery Sources
| Source | When |
|---|---|
spinoff_discovery | AI suggests a spinoff during EXPLODE_CATEGORY_ENTRY |
super_fact_discovery | FIND_SUPER_FACTS finds cross-entry correlations |
manual | Manually curated relationship |
Canonical UUID Ordering
The entry_id_a < entry_id_b constraint ensures each entity pair is stored exactly once, regardless of which direction the relationship was discovered.
Discover: "Michael Jordan" -> "Chicago Bulls"
UUIDs: jordan=aaa..., bulls=bbb...
aaa < bbb, so: entry_id_a=jordan, entry_id_b=bulls
relationship_a_to_b = "played for"
relationship_b_to_a = "employed"
Discover: "Chicago Bulls" -> "Michael Jordan"
UUIDs: bulls=bbb..., jordan=aaa...
bbb > aaa, swap! entry_id_a=jordan, entry_id_b=bulls
relationship_a_to_b = "employed" (swapped)
relationship_b_to_a = "played for" (swapped)
Result: Same row. Upsert takes max(strength).
The insertEntityLink() function handles this swap transparently. Callers pass IDs in any order.
Direct FK: fact_records.seed_entry_id
Each fact generated from the seed pipeline now carries a direct UUID reference to its originating seed entry. This replaces the old string-pattern matching on external_source_id LIKE 'seed:%'.
Benefits:
- Queryable with standard FK joins (no string parsing)
getFactsForSeedEntry(entryId)returns all facts for an entity- Enables future features like "show all facts about this entity"
Query Functions
| Function | File | Purpose |
|---|---|---|
insertEntityLink(link) | seed-queries.ts | Upsert a link with canonical ordering. Swaps descriptions if UUIDs reorder. Takes max strength on conflict. |
getLinkedEntities(entryId) | seed-queries.ts | Bidirectional query returning all linked entities with correct directional relationships. |
findEntryByNameOrAlias(name, topicPath?) | seed-queries.ts | Exact name match first, then ANY(aliases) fallback. Returns {id, name} or null. |
getFactsForSeedEntry(entryId, opts?) | seed-queries.ts | Facts for a seed entry with optional status filter and limit. |
insertSuperFactLinks(links) | seed-queries.ts | Now auto-resolves linkedEntryId from name when not provided. |
insertFactRecord(data) | fact-engine-queries.ts | Now accepts optional seedEntryId parameter. |
Pipeline Integration
Spinoff Discovery (explode-entry)
When the AI discovers spinoff entities during seed explosion:
EXPLODE_CATEGORY_ENTRY
|
+-- AI generates facts + spinoffs
|
+-- For each spinoff:
| insertSeedEntry() -> returns spinoffId (new or upserted)
| insertEntityLink(parentId, spinoffId, 'spinoff')
|
+-- IMPORT_FACTS messages include seed_entry_id
Previously, if a spinoff already existed (upsert), no relationship was recorded. Now insertEntityLink always fires, creating the link regardless.
Super Fact Discovery (find-super-facts)
When cross-entry correlations are found:
FIND_SUPER_FACTS
|
+-- AI finds correlations across batch entries
|
+-- For each super fact:
| insertFactRecord() + insertSuperFactLinks()
| Resolve entry IDs via findEntryByNameOrAlias()
| Create pairwise entity links between all resolved entities
Fact Import (import-facts)
The seed_entry_id flows through the full pipeline:
explode-entry
sets seed_entry_id in IMPORT_FACTS payload
-> import-facts
passes seedEntryId to insertFactRecord()
-> fact_records.seed_entry_id column populated
Migration (0161)
The migration handles schema changes and backfills:
- Creates
seed_entry_linkstable with constraints and RLS - Adds
seed_entry_idcolumn tofact_records - Adds
linked_entry_idcolumn tosuper_fact_links - Backfills
seed_entry_idby matchingexternal_source_idpatterns to entry names - Backfills
linked_entry_idby matchinglinked_entry_nameto entry names - Backfills
seed_entry_linksfrom existingparent_entry_idrelationships
Future: Degrees of Separation
The existing getConnectionPath(entryA, entryB) function in seed-queries.ts finds fact records linking two entities via super_fact_links. Combined with getLinkedEntities(), this enables the "Degrees of Separation" challenge format -- a quiz that tests whether users can identify how two seemingly unrelated entities connect.
The entity graph also enables:
- "Related entities" suggestions on fact cards
- Topic-spanning discovery feeds
- Entity-centric browsing ("everything about Michael Jordan")