Spilled Energy in Large Language Models
Adrian Robert Minut, Hazem Dewidar, Iacopo Masi

TL;DR
This paper reinterprets large language models as energy-based models to detect hallucinations and errors during decoding without additional training, using novel metrics derived from output logits.
Contribution
It introduces a training-free method to identify model failures by analyzing energy discrepancies, enabling effective hallucination detection across multiple models and tasks.
Findings
Effective hallucination detection across nine benchmarks
Robust correlation between energy spills and factual errors
No additional training required for the metrics
Abstract
We reinterpret the final Large Language Model (LLM) softmax classifier as an Energy-Based Model (EBM), decomposing the sequence-to-sequence probability chain into multiple interacting EBMs at inference. This principled approach allows us to track "energy spills" during decoding, which we empirically show correlate with factual errors, biases, and failures. Similar to Orgad et al. (2025), our method localizes the exact answer token and subsequently tests for hallucinations. Crucially, however, we achieve this without requiring trained probe classifiers or activation ablations. Instead, we introduce two completely training-free metrics derived directly from output logits: spilled energy, which captures the discrepancy between energy values across consecutive generation steps that should theoretically match, and marginalized energy, which is measurable at a single step. Evaluated on nine…
Peer Reviews
Decision·ICLR 2026 Poster
- The paper considers a very timely and important problem of hallucination detection. - The paper is well-written and easy to follow.
- In my view, the reliance on exact tokens is a significant limitation. I am familiar with the paper by Orgad et al., which was the first to demonstrate that identifying the right token can improve detection. However, I see this primarily as an interesting observation rather than a practical method for hallucination detection. The reason is that identifying such tokens typically requires external algorithms or the use of other LLMs, which leads to very high latency—making it impractical for real
1. The idea of reinterpreting the LLM's autoregressive sequence modeling as a chain of Energy-based Models (EBMs) is novel and very interesting. 2. The implementation of the hallucination detector is simple and training free, but achieve a good generalization across synthetic and real-world datsets. 3. The technical motivation and derivations are sound, and the method is well mathmatically-grounded. 4. The paper is well-written and the presentation is clear.
1. There might be some technical details make the detector hard to be applicable in the real-world tasks. - The paper emphasizes that localizing the signal to the "exact answer tokens" is essential. However, correctly to find the exact answer tokens might be non-trivial. In the paper, authors identify this span by prompting the LLM for a brief answer seems to be a bit fragile to me. This is essentially a dependence on a pre-processing step that relies on LLM's outputs, which may introduce no
1. The paper formulate the LLM's generation procedure as energy based model, which enable the following definition of "spilled energy" for hallucination detection. 2. The method proposed in the paper is a training-free method which make it lightweight and applicable to most LLM for hallucination detection. Empirically, the paper show results on both sythentic and realworld dataset, which validate that the method may have generalization across different domains. While other works (non-training f
1. The method needs to first identify the specific token span constituting the "exact answer". The paper implement this by "prompting the LLM for a brief answer”. My concern is that what if the LLM's "brief answer" is itself a hallucinated? What if the “exact answer” is wrongly identified? 2. The current evaluation focuses on tasks with answer that can be localized to a short range(words). My question is that how the method perform on more subtle hallucinations, such as a mutilple incorrect word
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Ferroelectric and Negative Capacitance Devices
