Is Softmax Loss All You Need? A Principled Analysis of Softmax-family Loss
Yuanhao Pu, Defu Lian, Enhong Chen

TL;DR
This paper provides a theoretical and empirical analysis of Softmax-family losses, exploring their properties, convergence, and efficiency trade-offs, to guide loss selection in large-class classification tasks.
Contribution
It offers a unified theoretical framework for Softmax-family losses, analyzing their consistency, convergence, and efficiency, with practical insights for large-scale applications.
Findings
Softmax loss aligns with classification and ranking metrics.
Gradient dynamics reveal different convergence behaviors.
Bias-variance decomposition offers convergence guarantees.
Abstract
The Softmax loss is one of the most widely employed surrogate objectives for classification and ranking tasks. To elucidate its theoretical properties, the Fenchel-Young framework situates it as a canonical instance within a broad family of surrogates. Concurrently, another line of research has addressed scalability when the number of classes is exceedingly large, in which numerous approximations have been proposed to retain the benefits of the exact objective while improving efficiency. Building on these two perspectives, we present a principled investigation of the Softmax-family losses. We examine whether different surrogates achieve consistency with classification and ranking metrics, and analyze their gradient dynamics to reveal distinct convergence behaviors. We also introduce a systematic bias-variance decomposition for approximate methods that provides convergence guarantees,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Imbalanced Data Classification Techniques · Explainable Artificial Intelligence (XAI)
