Label Smoothing is a Pragmatic Information Bottleneck

Sota Kudo

arXiv:2508.14077·cs.LG·August 21, 2025

Label Smoothing is a Pragmatic Information Bottleneck

Sota Kudo

PDF

Open Access

TL;DR

This paper interprets label smoothing as a practical implementation of the information bottleneck, demonstrating its theoretical basis and insensitivity to irrelevant factors, thus providing a new perspective on its effectiveness.

Contribution

It offers a theoretical and experimental analysis of label smoothing as an information bottleneck, highlighting its properties and practical implications.

Findings

01

Label smoothing explores the optimal information bottleneck solution.

02

It is insensitive to irrelevant factors not containing target information.

03

Experimental results support the theoretical interpretation.

Abstract

This study revisits label smoothing via a form of information bottleneck. Under the assumption of sufficient model flexibility and no conflicting labels for the same input, we theoretically and experimentally demonstrate that the model output obtained through label smoothing explores the optimal solution of the information bottleneck. Based on this, label smoothing can be interpreted as a practical approach to the information bottleneck, enabling simple implementation. As an information bottleneck method, we experimentally show that label smoothing also exhibits the property of being insensitive to factors that do not contain information about the target, or to factors that provide no additional information about it when conditioned on another variable.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques