Unbiased Estimations based on Binary Classifiers: A Maximum Likelihood Approach
Marco J.H. Puts, Piet J.H. Daas

TL;DR
This paper proposes a maximum likelihood estimator to accurately determine the true proportion of positive items in datasets, addressing bias issues caused by binary classifiers trained on different positive proportions.
Contribution
It introduces a novel maximum likelihood approach for unbiased estimation of positive item proportions without prior knowledge of the target distribution.
Findings
Estimator performs well on synthetic data
Effective on real-world datasets
Reduces bias in positive proportion estimation
Abstract
Binary classifiers trained on a certain proportion of positive items introduce a bias when applied to data sets with different proportions of positive items. Most solutions for dealing with this issue assume that some information on the latter distribution is known. However, this is not always the case, certainly when this proportion is the target variable. In this paper a maximum likelihood estimator for the true proportion of positives in data sets is suggested and tested on synthetic and real world data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Imbalanced Data Classification Techniques · Data Mining Algorithms and Applications
