Architecture-agnostic Lipschitz-constant Bayesian header and its application to resolve semantically proximal classification errors with vision transformers
Frederik Sch\"afer, Luis Mandl, Lars K\"alber, Tim Ricken

TL;DR
This paper introduces LipB-ViT, a Lipschitz-constrained Bayesian vision transformer that improves label noise detection and robustness in high-noise scenarios by enforcing spectral normalization and combining uncertainty with feature proximity.
Contribution
It proposes an architecture-agnostic Bayesian header with spectral normalization, a new uncertainty-confidence metric, and an adaptive fusion scheme for label noise detection in vision transformers.
Findings
Outperforms state-of-the-art k-NN methods by over 7% in label noise detection recall.
Demonstrates robustness against structured and unstructured noise at inference time.
Provides a plug-and-play approach compatible with pre-trained backbones across domains.
Abstract
Label noise remains a critical bottleneck for the generalization of supervised deep learning models, particularly when errors are structured rather than random. Standard robust training methods often fail in the presence of such semantically proximal classification errors. This work presents an architecture-agnostic Lipschitz-constant Bayesian header that can be integrated into feature extractors such as vision transformers, yielding the bi-Lipschitz-constrained Bayesian Vision Transformer (LipB-ViT). In contrast to conventional Bayesian layers, our approach enforces spectral normalization on both the mean and log-variance of the variational weights, which promotes calibrated predictive uncertainty and mitigates noise amplification. We further propose a novel metric to jointly capture uncertainty and confidence across misclassification rates, as well as an adaptive arithmetic-mean…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
