Hallucination is a Consequence of Space-Optimality: A Rate-Distortion Theorem for Membership Testing
Anxin Guo, Jingwei Li

TL;DR
This paper presents a rate-distortion theoretical framework explaining why large language models hallucinate, showing that hallucinations are an inevitable consequence of optimal information compression under limited capacity.
Contribution
It formalizes hallucination as a consequence of space-optimality in memory, unifying error metrics and providing a theoretical explanation for persistent hallucinations.
Findings
Hallucinations are a natural result of lossy compression in models.
Optimal strategies under limited capacity involve assigning high confidence to non-facts.
Empirical validation on synthetic data supports the theory.
Abstract
Large language models often hallucinate with high confidence on "random facts" that lack inferable patterns. We formalize the memorization of such facts as a membership testing problem, unifying the discrete error metrics of Bloom filters with the continuous log-loss of LLMs. By analyzing this problem in the regime where facts are sparse in the universe of plausible claims, we establish a rate-distortion theorem: the optimal memory efficiency is characterized by the minimum KL divergence between score distributions on facts and non-facts. This theoretical framework provides a distinctive explanation for hallucination: even with optimal training, perfect data, and a simplified "closed world" setting, the information-theoretically optimal strategy under limited capacity is not to abstain or forget, but to assign high confidence to some non-facts, resulting in hallucination. We validate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
