An Exploration of Softmax Alternatives Belonging to the Spherical Loss Family
Alexandre de Br\'ebisson, Pascal Vincent

TL;DR
This paper investigates alternative loss functions within the spherical family as substitutes for softmax in multi-class classification, revealing promising results on certain datasets despite some limitations.
Contribution
It introduces and evaluates spherical loss functions, including a new log-Taylor Softmax, expanding the options beyond traditional softmax for neural network training.
Findings
Spherical loss functions outperform softmax on MNIST and CIFAR-10.
Log-softmax remains superior on language modeling tasks.
New log-Taylor Softmax shows potential as an alternative.
Abstract
In a multi-class classification problem, it is standard to model the output of a neural network as a categorical distribution conditioned on the inputs. The output must therefore be positive and sum to one, which is traditionally enforced by a softmax. This probabilistic mapping allows to use the maximum likelihood principle, which leads to the well-known log-softmax loss. However the choice of the softmax function seems somehow arbitrary as there are many other possible normalizing functions. It is thus unclear why the log-softmax loss would perform better than other loss alternatives. In particular Vincent et al. (2015) recently introduced a class of loss functions, called the spherical family, for which there exists an efficient algorithm to compute the updates of the output weights irrespective of the output size. In this paper, we explore several loss functions from this family as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStock Market Forecasting Methods · Generative Adversarial Networks and Image Synthesis · Stochastic Gradient Optimization Techniques
MethodsSoftmax
