Toward a Characterization of Loss Functions for Distribution Learning
Nika Haghtalab, Cameron Musco, Bo Waggoner

TL;DR
This paper investigates the properties of loss functions for distribution learning, proposing criteria for good losses, and showing that with certain restrictions, multiple losses including log loss are suitable, encouraging domain-specific choices.
Contribution
It introduces axiomatic criteria for evaluating loss functions in distribution learning and demonstrates that, under distribution restrictions, various losses meet these criteria.
Findings
No loss function satisfies all criteria without restrictions.
Restricting to calibrated distributions allows multiple losses to be suitable.
Encourages domain-specific loss function selection in distribution learning.
Abstract
In this work we study loss functions for learning and evaluating probability distributions over large discrete domains. Unlike classification or regression where a wide variety of loss functions are used, in the distribution learning and density estimation literature, very few losses outside the dominant are applied. We aim to understand this fact, taking an axiomatic approach to the design of loss functions for learning distributions. We start by proposing a set of desirable criteria that any good loss function should satisfy. Intuitively, these criteria require that the loss function faithfully evaluates a candidate distribution, both in expectation and when estimated on a few samples. Interestingly, we observe that \emph{no loss function} possesses all of these criteria. However, one can circumvent this issue by introducing a natural restriction on the set of candidate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Face and Expression Recognition · Domain Adaptation and Few-Shot Learning
