Unbiased Estimations based on Binary Classifiers: A Maximum Likelihood   Approach

Marco J.H. Puts; Piet J.H. Daas

arXiv:2102.08659·stat.ML·February 18, 2021·1 cites

Unbiased Estimations based on Binary Classifiers: A Maximum Likelihood Approach

Marco J.H. Puts, Piet J.H. Daas

PDF

Open Access

TL;DR

This paper proposes a maximum likelihood estimator to accurately determine the true proportion of positive items in datasets, addressing bias issues caused by binary classifiers trained on different positive proportions.

Contribution

It introduces a novel maximum likelihood approach for unbiased estimation of positive item proportions without prior knowledge of the target distribution.

Findings

01

Estimator performs well on synthetic data

02

Effective on real-world datasets

03

Reduces bias in positive proportion estimation

Abstract

Binary classifiers trained on a certain proportion of positive items introduce a bias when applied to data sets with different proportions of positive items. Most solutions for dealing with this issue assume that some information on the latter distribution is known. However, this is not always the case, certainly when this proportion is the target variable. In this paper a maximum likelihood estimator for the true proportion of positives in data sets is suggested and tested on synthetic and real world data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Statistical Methods and Models · Imbalanced Data Classification Techniques · Data Mining Algorithms and Applications