INTRYGUE: Induction-Aware Entropy Gating for Reliable RAG Uncertainty Estimation
Alexandra Bazarova, Andrei Volodichev, Daria Kotova, Alexey Zaytsev

TL;DR
This paper introduces INTRYGUE, a novel entropy gating method that improves uncertainty estimation in retrieval-augmented generation models by leveraging induction head activations, reducing hallucinations and enhancing reliability.
Contribution
We propose INTRYGUE, a mechanistically grounded entropy gating technique that incorporates induction head signals to improve uncertainty quantification in RAG models.
Findings
INTRYGUE outperforms existing UQ baselines across multiple benchmarks.
Combining internal signals with predictive entropy enhances hallucination detection.
Method is effective across various LLM sizes from 4B to 13B parameters.
Abstract
While retrieval-augmented generation (RAG) significantly improves the factual reliability of LLMs, it does not eliminate hallucinations, so robust uncertainty quantification (UQ) remains essential. In this paper, we reveal that standard entropy-based UQ methods often fail in RAG settings due to a mechanistic paradox. An internal "tug-of-war" inherent to context utilization appears: while induction heads promote grounded responses by copying the correct answer, they collaterally trigger the previously established "entropy neurons". This interaction inflates predictive entropy, causing the model to signal false uncertainty on accurate outputs. To address this, we propose INTRYGUE (Induction-Aware Entropy Gating for Uncertainty Estimation), a mechanistically grounded method that gates predictive entropy based on the activation patterns of induction heads. Evaluated across four RAG…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Memory Processes and Influences · Generative Adversarial Networks and Image Synthesis
