AggTruth: Contextual Hallucination Detection using Aggregated Attention Scores in LLMs

Piotr Matys; Jan Eliasz; Konrad Kie{\l}czy\'nski; Miko{\l}aj Langner; Teddy Ferdinan; Jan Koco\'n; Przemys{\l}aw Kazienko

arXiv:2506.18628·cs.AI·June 24, 2025

AggTruth: Contextual Hallucination Detection using Aggregated Attention Scores in LLMs

Piotr Matys, Jan Eliasz, Konrad Kie{\l}czy\'nski, Miko{\l}aj Langner, Teddy Ferdinan, Jan Koco\'n, Przemys{\l}aw Kazienko

PDF

TL;DR

AggTruth is a novel method that detects hallucinations in large language models by analyzing internal attention scores, improving detection accuracy across various models and tasks.

Contribution

This paper introduces AggTruth, a new approach for online hallucination detection in LLMs using aggregated attention scores, with multiple variants and detailed analysis.

Findings

01

AggTruth outperforms state-of-the-art methods in multiple scenarios.

02

Careful selection of attention heads enhances detection performance.

03

The method is stable across different models and tasks.

Abstract

In real-world applications, Large Language Models (LLMs) often hallucinate, even in Retrieval-Augmented Generation (RAG) settings, which poses a significant challenge to their deployment. In this paper, we introduce AggTruth, a method for online detection of contextual hallucinations by analyzing the distribution of internal attention scores in the provided context (passage). Specifically, we propose four different variants of the method, each varying in the aggregation technique used to calculate attention scores. Across all LLMs examined, AggTruth demonstrated stable performance in both same-task and cross-task setups, outperforming the current SOTA in multiple scenarios. Furthermore, we conducted an in-depth analysis of feature selection techniques and examined how the number of selected attention heads impacts detection performance, demonstrating that careful selection of heads is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.