Spilled Energy in Large Language Models

Adrian Robert Minut; Hazem Dewidar; Iacopo Masi

arXiv:2602.18671·cs.AI·March 4, 2026

Spilled Energy in Large Language Models

Adrian Robert Minut, Hazem Dewidar, Iacopo Masi

PDF

Open Access 3 Reviews

TL;DR

This paper reinterprets large language models as energy-based models to detect hallucinations and errors during decoding without additional training, using novel metrics derived from output logits.

Contribution

It introduces a training-free method to identify model failures by analyzing energy discrepancies, enabling effective hallucination detection across multiple models and tasks.

Findings

01

Effective hallucination detection across nine benchmarks

02

Robust correlation between energy spills and factual errors

03

No additional training required for the metrics

Abstract

We reinterpret the final Large Language Model (LLM) softmax classifier as an Energy-Based Model (EBM), decomposing the sequence-to-sequence probability chain into multiple interacting EBMs at inference. This principled approach allows us to track "energy spills" during decoding, which we empirically show correlate with factual errors, biases, and failures. Similar to Orgad et al. (2025), our method localizes the exact answer token and subsequently tests for hallucinations. Crucially, however, we achieve this without requiring trained probe classifiers or activation ablations. Instead, we introduce two completely training-free metrics derived directly from output logits: spilled energy, which captures the discrepancy between energy values across consecutive generation steps that should theoretically match, and marginalized energy, which is measurable at a single step. Evaluated on nine…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 2Confidence 4

Strengths

- The paper considers a very timely and important problem of hallucination detection. - The paper is well-written and easy to follow.

Weaknesses

- In my view, the reliance on exact tokens is a significant limitation. I am familiar with the paper by Orgad et al., which was the first to demonstrate that identifying the right token can improve detection. However, I see this primarily as an interesting observation rather than a practical method for hallucination detection. The reason is that identifying such tokens typically requires external algorithms or the use of other LLMs, which leads to very high latency—making it impractical for real

Reviewer 02Rating 6Confidence 3

Strengths

1. The idea of reinterpreting the LLM's autoregressive sequence modeling as a chain of Energy-based Models (EBMs) is novel and very interesting. 2. The implementation of the hallucination detector is simple and training free, but achieve a good generalization across synthetic and real-world datsets. 3. The technical motivation and derivations are sound, and the method is well mathmatically-grounded. 4. The paper is well-written and the presentation is clear.

Weaknesses

1. There might be some technical details make the detector hard to be applicable in the real-world tasks. - The paper emphasizes that localizing the signal to the "exact answer tokens" is essential. However, correctly to find the exact answer tokens might be non-trivial. In the paper, authors identify this span by prompting the LLM for a brief answer seems to be a bit fragile to me. This is essentially a dependence on a pre-processing step that relies on LLM's outputs, which may introduce no

Reviewer 03Rating 4Confidence 3

Strengths

1. The paper formulate the LLM's generation procedure as energy based model, which enable the following definition of "spilled energy" for hallucination detection. 2. The method proposed in the paper is a training-free method which make it lightweight and applicable to most LLM for hallucination detection. Empirically, the paper show results on both sythentic and realworld dataset, which validate that the method may have generalization across different domains. While other works (non-training f

Weaknesses

1. The method needs to first identify the specific token span constituting the "exact answer". The paper implement this by "prompting the LLM for a brief answer”. My concern is that what if the LLM's "brief answer" is itself a hallucinated? What if the “exact answer” is wrongly identified? 2. The current evaluation focuses on tasks with answer that can be localized to a short range(words). My question is that how the method perform on more subtle hallucinations, such as a mutilple incorrect word

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Ferroelectric and Negative Capacitance Devices