Loss Functions and Operators Generated by f-Divergences
Vincent Roulet, Tianlin Liu, Nino Vieillard, Michael E. Sander, Mathieu Blondel

TL;DR
This paper introduces a flexible framework for creating new convex loss functions based on f-divergences, generalizing the logistic loss and associated operators, with empirical evaluation in language modeling tasks.
Contribution
It develops a novel class of loss functions and operators from f-divergences, including a new parallelizable algorithm for computing the associated softargmax, and evaluates their effectiveness in language models.
Findings
The alpha-divergence-based loss with alpha=1.5 performs well across multiple tasks.
The proposed f-softargmax operator can be efficiently computed with a new bisection algorithm.
New loss functions generalize and extend the logistic loss for diverse applications.
Abstract
The logistic loss (a.k.a. cross-entropy loss) is one of the most popular loss functions used for multiclass classification. It is also the loss function of choice for next-token prediction in language modeling. It is associated with the Kullback--Leibler (KL) divergence and the softargmax operator. In this work, we propose to construct new convex loss functions based on -divergences. Our loss functions generalize the logistic loss in two directions: i) by replacing the KL divergence with -divergences and ii) by allowing non-uniform reference measures. We instantiate our framework for numerous -divergences, recovering existing losses and creating new ones. By analogy with the logistic loss, the loss function generated by an -divergence is associated with an operator, that we dub -softargmax. We derive a novel parallelizable bisection algorithm for computing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsApproximation Theory and Sequence Spaces
