TPA: Next Token Probability Attribution for Detecting Hallucinations in RAG
Pengqian Lu, Jie Lu, Anjin Liu, and Guangquan Zhang

TL;DR
TPA introduces a comprehensive token attribution method that identifies hallucinations in RAG by analyzing contributions from multiple model components, improving detection accuracy.
Contribution
This work presents a novel attribution technique that considers seven sources influencing token probability, advancing hallucination detection in retrieval-augmented generation.
Findings
TPA achieves state-of-the-art hallucination detection performance.
Attribution scores reveal specific component contributions to linguistic categories.
POS-based aggregation helps identify anomalies indicating hallucinations.
Abstract
Detecting hallucinations in Retrieval-Augmented Generation remains a challenge. Prior approaches attribute hallucinations to a binary conflict between internal knowledge stored in FFNs and the retrieved context. However, this perspective is incomplete, failing to account for the impact of other components of the LLM, such as the user query, previously generated tokens, the self token, and the final LayerNorm adjustment. To comprehensively capture the impact of these components on hallucination detection, we propose TPA which mathematically attributes each token's probability to seven distinct sources: Query, RAG Context, Past Token, Self Token, FFN, Final LayerNorm, and Initial Embedding. This attribution quantifies how each source contributes to the generation of the next token. Specifically, we aggregate these attribution scores by Part-of-Speech (POS) tags to quantify the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
