Multiclass Loss Geometry Matters for Generalization of Gradient Descent in Separable Classification

Matan Schliserman; Tomer Koren

arXiv:2505.22359·cs.LG·May 29, 2025

Multiclass Loss Geometry Matters for Generalization of Gradient Descent in Separable Classification

Matan Schliserman, Tomer Koren

PDF

Open Access

TL;DR

This paper analyzes how the geometry of loss functions affects the generalization of gradient descent in multiclass linear classification, revealing that loss template geometry influences convergence rates and risk bounds.

Contribution

It introduces novel population risk bounds for multiclass gradient descent that depend on loss template geometry, extending understanding beyond binary classification.

Findings

01

Risk bounds depend on loss template geometry, not just decay rate.

02

For exponential losses, risk scales logarithmically with number of classes for p=∞.

03

Lower bounds show polynomial dependence on class number is unavoidable.

Abstract

We study the generalization performance of unregularized gradient methods for separable linear classification. While previous work mostly deal with the binary case, we focus on the multiclass setting with $k$ classes and establish novel population risk bounds for Gradient Descent for loss functions that decay to zero. In this setting, we show risk bounds that reveal that convergence rates are crucially influenced by the geometry of the loss template, as formalized by Wang and Scott (2024), rather than of the loss function itself. Particularly, we establish risk upper bounds that holds for any decay rate of the loss whose template is smooth with respect to the $p$ -norm. In the case of exponentially decaying losses, our results indicates a contrast between the $p = \infty$ case, where the risk exhibits a logarithmic dependence on $k$ , and $p = 2$ where the risk scales linearly with $k$ . To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Statistical Methods and Inference · Machine Learning and Algorithms