On Large Language Models' Hallucination with Regard to Known Facts
Che Jiang, Biqing Qi, Xiangyu Hong, Dayuan Fu, Yang Cheng, Fandong, Meng, Mo Yu, Bowen Zhou, Jie Zhou

TL;DR
This paper investigates the inference dynamics behind large language model hallucinations, revealing patterns and developing a classifier that predicts hallucinations with 88% accuracy based on token probability dynamics.
Contribution
It introduces a novel analysis of inference dynamics to understand hallucinations and develops a classifier to predict hallucinations accurately.
Findings
Hallucinations are linked to lack of abrupt probability increases in token dynamics.
A classifier using dynamic probability curves achieves 88% accuracy in detecting hallucinations.
Analysis reveals different layer-wise token probability behaviors in hallucinated vs. correct outputs.
Abstract
Large language models are successful in answering factoid questions but are also prone to hallucination. We investigate the phenomenon of LLMs possessing correct answer knowledge yet still hallucinating from the perspective of inference dynamics, an area not previously covered in studies on hallucinations. We are able to conduct this analysis via two key ideas. First, we identify the factual questions that query the same triplet knowledge but result in different answers. The difference between the model behaviors on the correct and incorrect outputs hence suggests the patterns when hallucinations happen. Second, to measure the pattern, we utilize mappings from the residual streams to vocabulary space. We reveal the different dynamics of the output token probabilities along the depths of layers between the correct and hallucinated cases. In hallucinated cases, the output token's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsBig Data and Digital Economy · Advanced Data Processing Techniques · Machine Learning in Healthcare
