Multiclass Loss Geometry Matters for Generalization of Gradient Descent in Separable Classification
Matan Schliserman, Tomer Koren

TL;DR
This paper analyzes how the geometry of loss functions affects the generalization of gradient descent in multiclass linear classification, revealing that loss template geometry influences convergence rates and risk bounds.
Contribution
It introduces novel population risk bounds for multiclass gradient descent that depend on loss template geometry, extending understanding beyond binary classification.
Findings
Risk bounds depend on loss template geometry, not just decay rate.
For exponential losses, risk scales logarithmically with number of classes for p=∞.
Lower bounds show polynomial dependence on class number is unavoidable.
Abstract
We study the generalization performance of unregularized gradient methods for separable linear classification. While previous work mostly deal with the binary case, we focus on the multiclass setting with classes and establish novel population risk bounds for Gradient Descent for loss functions that decay to zero. In this setting, we show risk bounds that reveal that convergence rates are crucially influenced by the geometry of the loss template, as formalized by Wang and Scott (2024), rather than of the loss function itself. Particularly, we establish risk upper bounds that holds for any decay rate of the loss whose template is smooth with respect to the -norm. In the case of exponentially decaying losses, our results indicates a contrast between the case, where the risk exhibits a logarithmic dependence on , and where the risk scales linearly with . To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Statistical Methods and Inference · Machine Learning and Algorithms
