
TL;DR
This paper explores the unique properties of the log loss function in probabilistic prediction, showing its selectivity and relationship with other standard loss functions like Brier and spherical loss.
Contribution
It demonstrates that the log loss function is the most selective proper loss, and characterizes the conditions under which prediction algorithms are optimal across different loss functions.
Findings
Log loss is most selective among proper loss functions.
Optimality under log loss implies optimality under any computable proper mixable loss.
There exist sequences where algorithms are optimal under Brier or spherical loss but not under log loss.
Abstract
The standard loss functions used in the literature on probabilistic prediction are the log loss function, the Brier loss function, and the spherical loss function; however, any computable proper loss function can be used for comparison of prediction algorithms. This note shows that the log loss function is most selective in that any prediction algorithm that is optimal for a given data sequence (in the sense of the algorithmic theory of randomness) under the log loss function will be optimal under any computable proper mixable loss function; on the other hand, there is a data sequence and a prediction algorithm that is optimal for that sequence under either of the two other standard loss functions but not under the log loss function.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
