Entity Linking Layer

How Eko connects entities in the seed pipeline to build a knowledge graph of relationships between people, places, organizations, and events.

What It Does

The entity linking layer adds many-to-many relationships between seed entries, replacing the one-directional parentEntryId tree with a bidirectional graph. It also connects fact records directly to their originating seed entries and resolves super fact links to concrete entity IDs.

Three problems this solves:

  1. Spinoff relationships were one-directional trees -- parentEntryId only tracks parent-child lineage for depth control. When a spinoff target already exists (upsert), the relationship was silently dropped.
  2. Facts had no direct link to entries -- fact_records.external_source_id stored "seed:name:idx" as a string pattern, not a queryable FK.
  3. Super fact links were name-only -- super_fact_links.linked_entry_name stored text but never resolved to an actual entry ID.

Data Model

seed_entry_queue          seed_entry_queue
 (Entity A)                (Entity B)
     |                          |
     +--- seed_entry_links -----+
     |    entry_id_a < entry_id_b
     |    connection_type
     |    relationship_a_to_b
     |    relationship_b_to_a
     |    strength (0..1)
     |
     +--- fact_records
     |    seed_entry_id (FK)
     |
     +--- super_fact_links
          linked_entry_id (FK)

seed_entry_links

The junction table for entity-to-entity relationships. Sits alongside parentEntryId (which continues to drive spinoff depth control).

ColumnTypePurpose
entry_id_auuid FKFirst entity (always the lexicographically smaller UUID)
entry_id_buuid FKSecond entity (always the larger UUID)
connection_typetextRelationship category
relationship_a_to_btextA's role relative to B (e.g., "played for")
relationship_b_to_atextB's role relative to A (e.g., "employed")
discovered_bytextHow the link was found
strengthrealConfidence/relevance score (0..1)

Constraints:

  • entry_id_a < entry_id_b -- canonical ordering prevents duplicate pairs
  • entry_id_a != entry_id_b -- no self-links
  • Unique on (entry_id_a, entry_id_b, connection_type) -- one link per type per pair

Connection Types

TypeMeaningExample
spinoffDiscovery lineage"Michael Jordan" -> "Chicago Bulls"
shared_eventBoth participated in same eventTwo athletes at the same Olympics
rivalryCompetitive relationship"Coca-Cola" <-> "Pepsi"
collaborationWorked togetherBand members
temporalSame time periodHistorical figures in same era
geographicSame locationCompanies headquartered in same city
causalCause-effect relationshipInvention and its consequences
relatedGeneral connection (default)Catch-all

Discovery Sources

SourceWhen
spinoff_discoveryAI suggests a spinoff during EXPLODE_CATEGORY_ENTRY
super_fact_discoveryFIND_SUPER_FACTS finds cross-entry correlations
manualManually curated relationship

Canonical UUID Ordering

The entry_id_a < entry_id_b constraint ensures each entity pair is stored exactly once, regardless of which direction the relationship was discovered.

Discover: "Michael Jordan" -> "Chicago Bulls"
  UUIDs: jordan=aaa..., bulls=bbb...
  aaa < bbb, so: entry_id_a=jordan, entry_id_b=bulls
  relationship_a_to_b = "played for"
  relationship_b_to_a = "employed"

Discover: "Chicago Bulls" -> "Michael Jordan"
  UUIDs: bulls=bbb..., jordan=aaa...
  bbb > aaa, swap! entry_id_a=jordan, entry_id_b=bulls
  relationship_a_to_b = "employed" (swapped)
  relationship_b_to_a = "played for" (swapped)

Result: Same row. Upsert takes max(strength).

The insertEntityLink() function handles this swap transparently. Callers pass IDs in any order.

Direct FK: fact_records.seed_entry_id

Each fact generated from the seed pipeline now carries a direct UUID reference to its originating seed entry. This replaces the old string-pattern matching on external_source_id LIKE 'seed:%'.

Benefits:

  • Queryable with standard FK joins (no string parsing)
  • getFactsForSeedEntry(entryId) returns all facts for an entity
  • Enables future features like "show all facts about this entity"

Query Functions

FunctionFilePurpose
insertEntityLink(link)seed-queries.tsUpsert a link with canonical ordering. Swaps descriptions if UUIDs reorder. Takes max strength on conflict.
getLinkedEntities(entryId)seed-queries.tsBidirectional query returning all linked entities with correct directional relationships.
findEntryByNameOrAlias(name, topicPath?)seed-queries.tsExact name match first, then ANY(aliases) fallback. Returns {id, name} or null.
getFactsForSeedEntry(entryId, opts?)seed-queries.tsFacts for a seed entry with optional status filter and limit.
insertSuperFactLinks(links)seed-queries.tsNow auto-resolves linkedEntryId from name when not provided.
insertFactRecord(data)fact-engine-queries.tsNow accepts optional seedEntryId parameter.

Pipeline Integration

Spinoff Discovery (explode-entry)

When the AI discovers spinoff entities during seed explosion:

EXPLODE_CATEGORY_ENTRY
  |
  +-- AI generates facts + spinoffs
  |
  +-- For each spinoff:
  |     insertSeedEntry() -> returns spinoffId (new or upserted)
  |     insertEntityLink(parentId, spinoffId, 'spinoff')
  |
  +-- IMPORT_FACTS messages include seed_entry_id

Previously, if a spinoff already existed (upsert), no relationship was recorded. Now insertEntityLink always fires, creating the link regardless.

Super Fact Discovery (find-super-facts)

When cross-entry correlations are found:

FIND_SUPER_FACTS
  |
  +-- AI finds correlations across batch entries
  |
  +-- For each super fact:
  |     insertFactRecord() + insertSuperFactLinks()
  |     Resolve entry IDs via findEntryByNameOrAlias()
  |     Create pairwise entity links between all resolved entities

Fact Import (import-facts)

The seed_entry_id flows through the full pipeline:

explode-entry
  sets seed_entry_id in IMPORT_FACTS payload
    -> import-facts
       passes seedEntryId to insertFactRecord()
         -> fact_records.seed_entry_id column populated

Migration (0161)

The migration handles schema changes and backfills:

  1. Creates seed_entry_links table with constraints and RLS
  2. Adds seed_entry_id column to fact_records
  3. Adds linked_entry_id column to super_fact_links
  4. Backfills seed_entry_id by matching external_source_id patterns to entry names
  5. Backfills linked_entry_id by matching linked_entry_name to entry names
  6. Backfills seed_entry_links from existing parent_entry_id relationships

Future: Degrees of Separation

The existing getConnectionPath(entryA, entryB) function in seed-queries.ts finds fact records linking two entities via super_fact_links. Combined with getLinkedEntities(), this enables the "Degrees of Separation" challenge format -- a quiz that tests whether users can identify how two seemingly unrelated entities connect.

The entity graph also enables:

  • "Related entities" suggestions on fact cards
  • Topic-spanning discovery feeds
  • Entity-centric browsing ("everything about Michael Jordan")