Mitigating Transformer Overconfidence via Lipschitz Regularization

Wenqian Ye; Yunsheng Ma; Xu Cao; Kun Tang

arXiv:2306.06849·cs.LG·July 19, 2023·1 cites

Mitigating Transformer Overconfidence via Lipschitz Regularization

Wenqian Ye, Yunsheng Ma, Xu Cao, Kun Tang

PDF

Open Access 1 Repo

TL;DR

This paper introduces LRFormer, a Lipschitz regularized Transformer that reduces overconfidence in predictions by ensuring Lipschitz continuity, leading to improved calibration and uncertainty estimation in vision tasks.

Contribution

The paper proposes a novel Lipschitz regularization method for Transformers using a new similarity function within Banach Space, with theoretical guarantees and superior empirical performance.

Findings

01

Outperforms state-of-the-art methods in prediction accuracy

02

Improves calibration and uncertainty estimation

03

Provides theoretical guarantees for Lipschitz regularization

Abstract

Though Transformers have achieved promising results in many computer vision tasks, they tend to be over-confident in predictions, as the standard Dot Product Self-Attention (DPSA) can barely preserve distance for the unbounded input domain. In this work, we fill this gap by proposing a novel Lipschitz Regularized Transformer (LRFormer). Specifically, we present a new similarity function with the distance within Banach Space to ensure the Lipschitzness and also regularize the term by a contractive Lipschitz Bound. The proposed method is analyzed with a theoretical guarantee, providing a rigorous basis for its effectiveness and reliability. Extensive experiments conducted on standard vision benchmarks demonstrate that our method outperforms the state-of-the-art single forward pass approaches in prediction, calibration, and uncertainty estimation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

szchai/lrformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Advanced Neural Network Applications

MethodsMulti-Head Attention · Attention Is All You Need · Dropout · Label Smoothing · Dense Connections · Linear Layer · Position-Wise Feed-Forward Layer · Layer Normalization · Absolute Position Encodings · Residual Connection