Sensitivity Uncertainty Alignment in Large Language Models
Prakul Sunil Hiremath, Harshit R. Hiremath

TL;DR
This paper introduces Sensitivity-Uncertainty Alignment (SUA), a framework to analyze and improve large language models' failure detection by aligning their sensitivity and uncertainty measures.
Contribution
It proposes a novel scalar score, SUA_theta, and a training method SUA-TR that enhance model calibration and failure detection across tasks.
Findings
SUA better identifies model failures than entropy or self-consistency.
Minimizing SUA_theta bounds worst-case perturbed risk.
SUA-TR improves reliability and safety in language models.
Abstract
We propose Sensitivity-Uncertainty Alignment (SUA), a framework for analyzing failures of large language models under adversarial and ambiguous inputs. We argue that adversarial sensitivity and ambiguity reflect a common issue: misalignment between prediction instability and model uncertainty. A reliable model should express higher uncertainty when its predictions are unstable; failure to do so leads to miscalibration. We define a scalar score, SUA_theta(x), capturing the difference between distributional sensitivity and predictive entropy. We show that minimizing its positive part bounds worst-case perturbed risk and relates to calibration error. We also formalize ambiguity collapse, where models produce overconfident outputs despite multiple valid interpretations. We introduce SUA-TR, a training method combining consistency regularization and entropy alignment, along with an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
