What are Models Thinking about? Understanding Large Language Model   Hallucinations "Psychology" through Model Inner State Analysis

Peiran Wang; Yang Liu; Yunfei Lu; Jue Hong; Ye Wu

arXiv:2502.13490·cs.CL·February 20, 2025

What are Models Thinking about? Understanding Large Language Model Hallucinations "Psychology" through Model Inner State Analysis

Peiran Wang, Yang Liu, Yunfei Lu, Jue Hong, Ye Wu

PDF

Open Access

TL;DR

This paper investigates the internal states of large language models during inference to understand and detect hallucinations without relying on external sources, providing insights into model behavior and improving detection methods.

Contribution

It introduces a systematic analysis of LLM internal states during different inference stages and evaluates their effectiveness in hallucination detection, enhancing interpretability.

Findings

01

Internal states reveal key features associated with hallucinations.

02

Analyzing stages helps understand why hallucinations occur.

03

Internal state-based detection has advantages and limitations.

Abstract

Large language model (LLM) systems suffer from the models' unstable ability to generate valid and factual content, resulting in hallucination generation. Current hallucination detection methods heavily rely on out-of-model information sources, such as RAG to assist the detection, thus bringing heavy additional latency. Recently, internal states of LLMs' inference have been widely used in numerous research works, such as prompt injection detection, etc. Considering the interpretability of LLM internal states and the fact that they do not require external information sources, we introduce such states into LLM hallucination detection. In this paper, we systematically analyze different internal states' revealing features during inference forward and comprehensively evaluate their ability in hallucination detection. Specifically, we cut the forward process of a large language model into…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCognitive Science and Education Research · Mental Health Research Topics · Opinion Dynamics and Social Influence

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Byte Pair Encoding · Adam · Softmax · Dropout · Weight Decay · BART · Linear Layer · WordPiece