The Implicit Bias of Gradient Descent on Separable Multiclass Data

Hrithik Ravi; Clayton Scott; Daniel Soudry; Yutong Wang

arXiv:2411.01350·cs.LG·November 8, 2024

The Implicit Bias of Gradient Descent on Separable Multiclass Data

Hrithik Ravi, Clayton Scott, Daniel Soudry, Yutong Wang

PDF

Open Access 1 Video

TL;DR

This paper extends the theory of implicit bias in gradient descent from binary to multiclass classification by introducing a new class of losses and demonstrating the bias behavior under this broader setting.

Contribution

It introduces a multiclass extension of the exponential tail property using PERM losses and generalizes existing implicit bias results to multiclass scenarios.

Findings

01

Extended implicit bias results to multiclass classification.

02

Introduced a new class of losses encompassing cross-entropy.

03

Bridged the binary-multiclass analysis gap using PERM framework.

Abstract

Implicit bias describes the phenomenon where optimization-based training algorithms, without explicit regularization, show a preference for simple estimators even when more complex estimators have equal objective values. Multiple works have developed the theory of implicit bias for binary classification under the assumption that the loss satisfies an exponential tail property. However, there is a noticeable gap in analysis for multiclass classification, with only a handful of results which themselves are restricted to the cross-entropy loss. In this work, we employ the framework of Permutation Equivariant and Relative Margin-based (PERM) losses [Wang and Scott, 2024] to introduce a multiclass extension of the exponential tail property. This class of losses includes not only cross-entropy but also other losses. Using this framework, we extend the implicit bias result of Soudry et al.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

The Implicit Bias of Gradient Descent on Separable Multiclass Data· slideslive

Taxonomy

TopicsFace and Expression Recognition · Statistical Methods and Inference · Sparse and Compressive Sensing Techniques