Label Smoothing is a Pragmatic Information Bottleneck
Sota Kudo

TL;DR
This paper interprets label smoothing as a practical implementation of the information bottleneck, demonstrating its theoretical basis and insensitivity to irrelevant factors, thus providing a new perspective on its effectiveness.
Contribution
It offers a theoretical and experimental analysis of label smoothing as an information bottleneck, highlighting its properties and practical implications.
Findings
Label smoothing explores the optimal information bottleneck solution.
It is insensitive to irrelevant factors not containing target information.
Experimental results support the theoretical interpretation.
Abstract
This study revisits label smoothing via a form of information bottleneck. Under the assumption of sufficient model flexibility and no conflicting labels for the same input, we theoretically and experimentally demonstrate that the model output obtained through label smoothing explores the optimal solution of the information bottleneck. Based on this, label smoothing can be interpreted as a practical approach to the information bottleneck, enabling simple implementation. As an information bottleneck method, we experimentally show that label smoothing also exhibits the property of being insensitive to factors that do not contain information about the target, or to factors that provide no additional information about it when conditioned on another variable.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
