The Implicit Bias of Logit Regularization

Alon Beck; Yohai Bar Sinai; Noam Levi

arXiv:2602.12039·stat.ML·February 16, 2026

The Implicit Bias of Logit Regularization

Alon Beck, Yohai Bar Sinai, Noam Levi

PDF

Open Access

TL;DR

This paper investigates how logit regularization methods, like label smoothing, implicitly bias classifiers towards logit clustering, which improves generalization and reduces sample complexity, especially in Gaussian data scenarios.

Contribution

It provides a theoretical analysis of logit regularizers, showing their role in inducing logit clustering and aligning weights with Fisher's Linear Discriminant, extending understanding of label smoothing.

Findings

01

Logit regularization induces logit clustering around finite targets.

02

Clustering drives weights to align with Fisher's Linear Discriminant.

03

Regularization reduces sample complexity and induces grokking in noisy models.

Abstract

Logit regularization, the addition of a convex penalty directly in logit space, is widely used in modern classifiers, with label smoothing as a prominent example. While such methods often improve calibration and generalization, their mechanism remains under-explored. In this work, we analyze a general class of such logit regularizers in the context of linear classification, and demonstrate that they induce an implicit bias of logit clustering around finite per-sample targets. For Gaussian data, or whenever logits are sufficiently clustered, we prove that logit clustering drives the weight vector to align exactly with Fisher's Linear Discriminant. To demonstrate the consequences, we study a simple signal-plus-noise model in which this transition has dramatic effects: Logit regularization halves the critical sample complexity and induces grokking in the small-noise limit, while making…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace and Expression Recognition · Distributed Sensor Networks and Detection Algorithms · Machine Learning and Data Classification