Calibrating LLM Confidence by Probing Perturbed Representation Stability
Reza Khanmohammadi, Erfan Miahi, Mehrsa Mardikoraem, Simerjot Kaur, Ivan Brugere, Charese H. Smiley, Kundan Thind, and Mohammad M. Ghassemi

TL;DR
This paper introduces CCPS, a novel method that improves the calibration of LLM confidence estimates by analyzing the stability of internal representations under adversarial perturbations, leading to more reliable AI outputs.
Contribution
The paper presents CCPS, a new approach that leverages internal representation stability and adversarial perturbations to enhance LLM confidence calibration, outperforming existing methods.
Findings
CCPS reduces Expected Calibration Error by ~55%.
CCPS increases accuracy by 5 percentage points.
CCPS outperforms prior methods across multiple benchmarks.
Abstract
Miscalibration in Large Language Models (LLMs) undermines their reliability, highlighting the need for accurate confidence estimation. We introduce CCPS (Calibrating LLM Confidence by Probing Perturbed Representation Stability), a novel method analyzing internal representational stability in LLMs. CCPS applies targeted adversarial perturbations to final hidden states, extracts features reflecting the model's response to these perturbations, and uses a lightweight classifier to predict answer correctness. CCPS was evaluated on LLMs from 8B to 32B parameters (covering Llama, Qwen, and Mistral architectures) using MMLU and MMLU-Pro benchmarks in both multiple-choice and open-ended formats. Our results show that CCPS significantly outperforms current approaches. Across four LLMs and three MMLU variants, CCPS reduces Expected Calibration Error by approximately 55% and Brier score by 21%,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods · Natural Language Processing Techniques
