TL;DR
This paper introduces a geometric framework called APORIA to analyze and detect hallucinations in small-sized language models by examining response clustering in embedding space, enabling efficient classification.
Contribution
It presents a novel geometric perspective on hallucinations, introduces APORIA-LP for response classification, and releases a large labeled dataset for further research.
Findings
Genuine responses cluster more tightly than hallucinated ones in embedding space.
Fisher projection makes response classes consistently separable.
APORIA-LP achieves over 90% F1 score with minimal annotations.
Abstract
Hallucinations -- plausible but factually incorrect responses -- pose a major challenge to the reliability of Large Language Models (LLMs), especially in multi-step or agentic settings. Existing work largely frames hallucinations as a consequence of missing knowledge; we show instead that, even when the relevant factual knowledge is present, models still produce hallucinated answers, pointing to retrieval instability rather than knowledge gaps. Building on this observation, we introduce APORIA (Aggregate Prompt-wise Observation Retrieving Instability via Asymmetry -- the state of puzzlement-in-contradiction that hallucinations embody), a geometric framework that studies repeated responses to the same prompt in sentence-embedding space. Our central hypothesis is that genuine responses cluster more tightly than hallucinated ones; we empirically validate this and show that, after…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Mental Health via Writing
