Between the Layers Lies the Truth: Uncertainty Estimation in LLMs Using Intra-Layer Local Information Scores
Zvi N. Badash, Yonatan Belinkov, Moti Freiman

TL;DR
This paper introduces a lightweight, intra-layer agreement-based uncertainty estimation method for LLMs that outperforms probing, especially in transfer and quantized settings, by analyzing cross-layer internal representations.
Contribution
It proposes a novel, compact intra-layer agreement scoring method for uncertainty estimation that is transferable, robust, and efficient across different models and quantization levels.
Findings
Matches probing in-distribution with minimal performance difference.
Outperforms probing in cross-dataset transfer scenarios.
Remains effective under 4-bit weight quantization.
Abstract
Large language models (LLMs) are often confidently wrong, making reliable uncertainty estimation (UE) essential. Output-based heuristics are cheap but brittle, while probing internal representations is effective yet high-dimensional and hard to transfer. We propose a compact, per-instance UE method that scores cross-layer agreement patterns in internal representations using a single forward pass. Across three models, our method matches probing in-distribution, with mean diagonal differences of at most AUPRC percentage points and Brier score points. Under cross-dataset transfer, it consistently outperforms probing, achieving off-diagonal gains up to AUPRC and Brier points. Under 4-bit weight-only quantization, it remains robust, improving over probing by AUPRC points and Brier points on average. Beyond performance, examining specific…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Topic Modeling
