Sensitivity Uncertainty Alignment in Large Language Models

Prakul Sunil Hiremath; Harshit R. Hiremath

arXiv:2604.20903·cs.CR·April 24, 2026

Sensitivity Uncertainty Alignment in Large Language Models

Prakul Sunil Hiremath, Harshit R. Hiremath

PDF

TL;DR

This paper introduces Sensitivity-Uncertainty Alignment (SUA), a framework to analyze and improve large language models' failure detection by aligning their sensitivity and uncertainty measures.

Contribution

It proposes a novel scalar score, SUA_theta, and a training method SUA-TR that enhance model calibration and failure detection across tasks.

Findings

01

SUA better identifies model failures than entropy or self-consistency.

02

Minimizing SUA_theta bounds worst-case perturbed risk.

03

SUA-TR improves reliability and safety in language models.

Abstract

We propose Sensitivity-Uncertainty Alignment (SUA), a framework for analyzing failures of large language models under adversarial and ambiguous inputs. We argue that adversarial sensitivity and ambiguity reflect a common issue: misalignment between prediction instability and model uncertainty. A reliable model should express higher uncertainty when its predictions are unstable; failure to do so leads to miscalibration. We define a scalar score, SUA_theta(x), capturing the difference between distributional sensitivity and predictive entropy. We show that minimizing its positive part bounds worst-case perturbed risk and relates to calibration error. We also formalize ambiguity collapse, where models produce overconfident outputs despite multiple valid interpretations. We introduce SUA-TR, a training method combining consistency regularization and entropy alignment, along with an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.