MaxSup: Overcoming Representation Collapse in Label Smoothing
Yuxuan Zhou, Heng Li, Zhi-Qi Cheng, Xudong Yan, Yifei Dong, Mario Fritz, Margret Keuper

TL;DR
This paper identifies the limitations of label smoothing in neural networks, particularly its tendency to cause overconfidence in misclassified samples and reduce intra-class diversity, and proposes MaxSup to mitigate these issues.
Contribution
The paper introduces MaxSup, a novel regularization method that penalizes the top-1 logit to improve feature diversity and robustness over traditional label smoothing.
Findings
MaxSup restores intra-class variation
MaxSup sharpens inter-class boundaries
MaxSup outperforms label smoothing in robustness experiments
Abstract
Label Smoothing (LS) is widely adopted to reduce overconfidence in neural network predictions and improve generalization. Despite these benefits, recent studies reveal two critical issues with LS. First, LS induces overconfidence in misclassified samples. Second, it compacts feature representations into overly tight clusters, diluting intra-class diversity, although the precise cause of this phenomenon remained elusive. In this paper, we analytically decompose the LS-induced loss, exposing two key terms: (i) a regularization term that dampens overconfidence only when the prediction is correct, and (ii) an error-amplification term that arises under misclassifications. This latter term compels the network to reinforce incorrect predictions with undue certainty, exacerbating representation collapse. To address these shortcomings, we propose Max Suppression (MaxSup), which applies uniform…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsVideo Analysis and Summarization
