Calibration Attention: Learning Reliability-Aware Representations for Vision Transformers
Wenhao Liang, Wei Emma Zhang, Lin Yue, Miao Xu, Mingyu Guo, Olaf Maennel, Weitong Chen

TL;DR
Calibration Attention introduces a representation-aware calibration method for vision transformers that improves uncertainty estimation by reshaping internal representations rather than just adjusting output logits, leading to better calibration and trustworthiness.
Contribution
The paper proposes Calibration Attention (CalAttn), a novel representation-level calibration module that couples instance-wise temperature scaling with transformer token geometry for improved uncertainty estimation.
Findings
CalAttn significantly reduces calibration error across multiple datasets.
It maintains or improves the accuracy of vision transformers.
CalAttn adds negligible computational overhead.
Abstract
Most calibration methods operate at the logit level, implicitly assuming that miscalibration can be corrected without changing the underlying representation. We challenge this assumption and propose \textbf{Calibration Attention (CalAttn)}, a \emph{representation-aware} calibration module for vision transformers that couples instance-wise temperature scaling to transformer token geometry under a proper scoring objective. CalAttn predicts a sample-specific temperature from the \texttt{[CLS]} token and backpropagates calibration gradients into the backbone, thereby reshaping the uncertainty structure of the representation rather than post-hoc adjusting confidence. This yields \emph{token-conditioned uncertainty modulation} with negligible overhead (\(<0.1\%\) additional parameters). Across multiple datasets with ViT/DeiT/Swin backbones, CalAttn consistently improves calibration while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Advanced Memory and Neural Computing
