AI Safety Policy

Purpose Eko uses LLMs only to interpret detected change. This policy defines guardrails for hallucinations, over-quoting, tone drift, and prompt-failure handling.

When AI Is Allowed

LLMs may run only after the system has:

  • detected a change event (diff-first), and
  • prepared a bounded, non-substitutive input (delta + minimal context).

LLMs must not be used to:

  • fetch, crawl, or “fill in” missing page content,
  • infer paywalled/blocked text,
  • summarize entire pages without a delta.

Output Requirements

All summaries must:

  • Be delta-first (what changed → why it matters → confidence).
  • Avoid reconstructing source content.
  • Use plain language and label uncertainty.
  • Include a clear confidence rating (High/Medium/Low).

Hallucination Controls

Hard constraints

  • The model may only reference facts present in the provided diff/context.
  • If information is missing, the model must say so explicitly.

Heuristics that trigger a downgrade

  • Claims about motives/intent (“they changed this because…”) without evidence.
  • New numbers, dates, or names not present in the input.
  • Overly specific operational guidance that isn’t supported by the delta.

Required phrasing when uncertain

  • “The page now suggests…”
  • “No explicit announcement accompanies this change.”
  • “This appears to be a wording/structure change; impact is unclear.”

Over-Quoting Controls

  • Prefer paraphrase.
  • Quotes are permitted only when strictly necessary to pinpoint a change.
  • Never include long passages or multiple paragraphs from the source.

Prompt-Failure Handling

If the model fails any guardrail (hallucination, substitution, excessive quoting, tone drift):

  1. Suppress the summary or truncate to safe minimal bullets.

  2. Fallback to a deterministic change report (structure + counts + timestamps).

  3. Log the failure with a categorized reason:

    • hallucination_risk
    • substitutive_output_risk
    • excessive_quotes
    • tone_drift
    • input_too_large
    • ambiguous_diff

Safety Tiers

  • Tier 0 (Deterministic): structural delta only (always safe)
  • Tier 1 (Constrained summary): delta + minimal context; no quotes
  • Tier 2 (Quoted pinpoints): very short quotes to disambiguate a change

Default is Tier 1. Tier 2 requires an explicit need (ambiguity) and must remain minimal.

Human Review Triggers

Flag for review when:

  • The page is legal/policy/ToS and change is non-trivial.
  • The diff suggests price, eligibility, security, or compliance impact.
  • The model confidence is Low but the change score is high.

Compliance

This policy complements /docs/policies/fair-use-and-non-substitutive.md and is enforced by prompt templates, output validators, and fallback behavior.