The Implicit Bias of Logit Regularization
Alon Beck, Yohai Bar Sinai, Noam Levi

TL;DR
This paper investigates how logit regularization methods, like label smoothing, implicitly bias classifiers towards logit clustering, which improves generalization and reduces sample complexity, especially in Gaussian data scenarios.
Contribution
It provides a theoretical analysis of logit regularizers, showing their role in inducing logit clustering and aligning weights with Fisher's Linear Discriminant, extending understanding of label smoothing.
Findings
Logit regularization induces logit clustering around finite targets.
Clustering drives weights to align with Fisher's Linear Discriminant.
Regularization reduces sample complexity and induces grokking in noisy models.
Abstract
Logit regularization, the addition of a convex penalty directly in logit space, is widely used in modern classifiers, with label smoothing as a prominent example. While such methods often improve calibration and generalization, their mechanism remains under-explored. In this work, we analyze a general class of such logit regularizers in the context of linear classification, and demonstrate that they induce an implicit bias of logit clustering around finite per-sample targets. For Gaussian data, or whenever logits are sufficiently clustered, we prove that logit clustering drives the weight vector to align exactly with Fisher's Linear Discriminant. To demonstrate the consequences, we study a simple signal-plus-noise model in which this transition has dramatic effects: Logit regularization halves the critical sample complexity and induces grokking in the small-noise limit, while making…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Distributed Sensor Networks and Detection Algorithms · Machine Learning and Data Classification
