The Implicit Bias of Gradient Descent on Separable Multiclass Data
Hrithik Ravi, Clayton Scott, Daniel Soudry, Yutong Wang

TL;DR
This paper extends the theory of implicit bias in gradient descent from binary to multiclass classification by introducing a new class of losses and demonstrating the bias behavior under this broader setting.
Contribution
It introduces a multiclass extension of the exponential tail property using PERM losses and generalizes existing implicit bias results to multiclass scenarios.
Findings
Extended implicit bias results to multiclass classification.
Introduced a new class of losses encompassing cross-entropy.
Bridged the binary-multiclass analysis gap using PERM framework.
Abstract
Implicit bias describes the phenomenon where optimization-based training algorithms, without explicit regularization, show a preference for simple estimators even when more complex estimators have equal objective values. Multiple works have developed the theory of implicit bias for binary classification under the assumption that the loss satisfies an exponential tail property. However, there is a noticeable gap in analysis for multiclass classification, with only a handful of results which themselves are restricted to the cross-entropy loss. In this work, we employ the framework of Permutation Equivariant and Relative Margin-based (PERM) losses [Wang and Scott, 2024] to introduce a multiclass extension of the exponential tail property. This class of losses includes not only cross-entropy but also other losses. Using this framework, we extend the implicit bias result of Soudry et al.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsFace and Expression Recognition · Statistical Methods and Inference · Sparse and Compressive Sensing Techniques
