Hallucination

Hallucination is the term for an LLM output that sounds confident but is unsupported by the provided context or factually wrong. It is the defining failure mode of generative models and the reason high-stakes applications layer retrieval, verification, or human review on top.

Common types

  • Intrinsic hallucination. The output contradicts the input or provided context, for example summarising a document and inventing a fact.
  • Extrinsic hallucination. The output adds information that was not in the input and is not verifiable from training data.
  • Citation hallucination. The model fabricates references, URLs, or author names that look plausible.
  • Code hallucination. The model calls a function, library, or API that does not exist.

Common mitigations

  • RAG. Grounding answers in retrieved evidence and requiring citations back to source passages.
  • Constrained decoding. Forcing output to match a schema, regex, or restricted vocabulary.
  • Verification step. A second LLM or rule-based check that validates the answer against the source.
  • Lower temperature. Reduces but does not eliminate hallucination.
🔗

Subscribe to Sahil's Playbook

Clear thinking on product, engineering, and building at scale. No noise. One email when there's something worth sharing.
[email protected]
Subscribe
Mastodon