On the Universality of the Logistic Loss Function

Amichai Painsky; Gregory W. Wornell

arXiv:1805.03804·cs.IT·May 11, 2018

On the Universality of the Logistic Loss Function

Amichai Painsky, Gregory W. Wornell

PDF

TL;DR

This paper demonstrates that for binary classification, the divergence from smooth, proper, convex loss functions is bounded by the KL divergence, justifying the widespread use of log-loss across various machine learning models.

Contribution

It establishes that minimizing log-loss bounds the divergence of other loss functions, providing a theoretical foundation for its broad application.

Findings

01

KL divergence bounds other convex loss divergences from above

02

Log-loss minimizes an upper bound to various loss functions

03

Introduces new divergence inequalities similar to Pinsker inequality

Abstract

A loss function measures the discrepancy between the true values (observations) and their estimated fits, for a given instance of data. A loss function is said to be proper (unbiased, Fisher consistent) if the fits are defined over a unit simplex, and the minimizer of the expected loss is the true underlying probability of the data. Typical examples are the zero-one loss, the quadratic loss and the Bernoulli log-likelihood loss (log-loss). In this work we show that for binary classification problems, the divergence associated with smooth, proper and convex loss functions is bounded from above by the Kullback-Leibler (KL) divergence, up to a multiplicative normalization constant. It implies that by minimizing the log-loss (associated with the KL divergence), we minimize an upper bound to any choice of loss functions from this set. This property justifies the broad use of log-loss in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.