Enhancing Hallucination Detection through Noise Injection
Litian Liu, Reza Pourreza, Sunny Panchal, Apratim Bhattacharyya, Yubing Jian, Yao Qin, Roland Memisevic

TL;DR
This paper introduces a simple, training-free method that enhances hallucination detection in large language models by injecting noise into model parameters, leveraging Bayesian uncertainty to improve accuracy across various datasets and architectures.
Contribution
The paper proposes a novel noise injection technique during sampling that significantly improves hallucination detection without additional training.
Findings
Improved hallucination detection accuracy across multiple datasets.
Enhanced detection performance by perturbing model parameters during sampling.
Method is effective across diverse model architectures and uncertainty metrics.
Abstract
Large Language Models (LLMs) are prone to generating plausible yet incorrect responses, known as hallucinations. Effectively detecting hallucinations is therefore crucial for the safe deployment of LLMs. Recent research has linked hallucinations to model uncertainty, suggesting that hallucinations can be detected by measuring dispersion over answer distributions obtained from multiple samples drawn from a model. While drawing from the distribution over tokens defined by the model is a natural way to obtain samples, in this work, we argue that it is suboptimal for the purpose of detecting hallucinations. We show that detection can be improved significantly by taking into account model uncertainty in the Bayesian sense. To this end, we propose a very simple, training-free approach based on perturbing an appropriate subset of model parameters, or equivalently hidden unit activations,…
Peer Reviews
Decision·ICLR 2026 Poster
1. The paper focuses on a targeted research question, and does a good job of providing an in-depth discussion. 2. The improvement in hallucination detection is obvious. But for me, beyond these improvements, the paper provides a more accurate way to measure the overall uncertainty of an LLM, and thus, the technique here can be highly valuable. 3. The paper is well written, and I enjoyed reading it. The depth of the experiments is good, and the paper provides answers to many different ablation qu
There are many other questions that could have been asked, but in my opinion, the paper did a good job focusing on one problem statement and providing appropriate depth. My only complaint, and I hate to give such cliched feedback, but I really wanted to see experiments on a bigger variety of datasets. Different datasets seem to have different trends in the 3 datasets used in the paper, and an analysis over a larger set of datasets would have been very interesting. For instance, how well does th
- I like that the method is simple and justified from a theoretical perspective. - The discussion on the complementary effect of aleatoric and epistemic uncertainty was insightful. - The experiments section includes a variety of ablation studies which vary several relevant factors such as noise, temperature, layers in which noise is injected etc., Since, noise injection is easy to combine with other sample based approaches, I appreciate that the authors already included an ablation study that d
Some choices and details were unclear to me and it would be nice to see more justification/discussion pertinent to these: - Why did you pick the uniform distribution for the noise? - How did you measure accuracy in Figure 4b? I am asking because there is no obvious way to judge the correctness of an LLM response compared to a ground truth. I believe it is important to be explicit about this detail (If you did include this in the manuscript already and I missed it, I apologize). I am happy to r
1. Simple, training-free mechanism with negligible latency overhead; easy to bolt onto many LLMs and sampling-based detectors. 2. Principled motivation: separates aleatoric from epistemic uncertainty; implements the latter via internal perturbations approximating a posterior over models. 3. Consistent gains across datasets/models/metrics. 4. Thorough ablations. 5. Complementarity with input perturbations and with standard sampling - jointly using both sources yields the strongest results.
1. Noise distribution centering: The paper repeatedly injects U(0, α) (positive-only) noise and even defines q(omega) such that it is not centered at zero around the checkpoint parameter; this can introduce a mean shift rather than a purely variance-based epistemic perturbation. A symmetric zero-mean alternative is only partially explored and merits direct head-to-head comparison. 2. Metric dependence: A central result uses Answer Entropy (counting distinct final answers). This is natural for
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · EEG and Brain-Computer Interfaces · Cognitive Science and Education Research
MethodsSparse Evolutionary Training
