Consistency and Finite Sample Behavior of Binary Class Probability Estimation
Alexander Mey, Marco Loog

TL;DR
This paper explores how empirical risk minimization can be used to accurately estimate class probabilities, providing finite sample convergence rates and analyzing the impact of different loss functions and model assumptions.
Contribution
It extends existing theoretical results by deriving finite sample L1-convergence rates for class probability estimators within the ERM framework.
Findings
Finite sample L1-convergence rates are established for various surrogate loss functions.
Certain loss functions are identified as more suitable for probability estimation.
The study discusses model misspecification and asymmetric loss functions in the context of probability estimation.
Abstract
In this work we investigate to which extent one can recover class probabilities within the empirical risk minimization (ERM) paradigm. The main aim of our paper is to extend existing results and emphasize the tight relations between empirical risk minimization and class probability estimation. Based on existing literature on excess risk bounds and proper scoring rules, we derive a class probability estimator based on empirical risk minimization. We then derive fairly general conditions under which this estimator will converge, in the L1-norm and in probability, to the true class probabilities. Our main contribution is to present a way to derive finite sample L1-convergence rates of this estimator for different surrogate loss functions. We also study in detail which commonly used loss functions are suitable for this estimation problem and finally discuss the setting of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
