Bayesian analysis of the prevalence bias: learning and predicting from   imbalanced data

Loic Le Folgoc; Vasileios Baltatzis; Amir Alansary; Sujal; Desai; Anand Devaraj; Sam Ellis; Octavio E. Martinez Manzanera and; Fahdi Kanavati; Arjun Nair; Julia Schnabel; Ben Glocker

arXiv:2108.00250·cs.LG·August 3, 2021·1 cites

Bayesian analysis of the prevalence bias: learning and predicting from imbalanced data

Loic Le Folgoc, Vasileios Baltatzis, Amir Alansary, Sujal, Desai, Anand Devaraj, Sam Ellis, Octavio E. Martinez Manzanera and, Fahdi Kanavati, Arjun Nair, Julia Schnabel, Ben Glocker

PDF

Open Access

TL;DR

This paper introduces a Bayesian framework to address prevalence bias in imbalanced datasets, providing a theoretically grounded loss function and predictive rules that improve model robustness in real-world applications.

Contribution

It develops a novel bias-corrected loss function and predictive methods based on Bayesian risk minimization, specifically targeting prevalence bias in training data.

Findings

01

Bias-corrected loss improves model calibration.

02

Framework integrates seamlessly with deep learning.

03

Provides principled alternative to heuristic methods.

Abstract

Datasets are rarely a realistic approximation of the target population. Say, prevalence is misrepresented, image quality is above clinical standards, etc. This mismatch is known as sampling bias. Sampling biases are a major hindrance for machine learning models. They cause significant gaps between model performance in the lab and in the real world. Our work is a solution to prevalence bias. Prevalence bias is the discrepancy between the prevalence of a pathology and its sampling rate in the training dataset, introduced upon collecting data or due to the practioner rebalancing the training batches. This paper lays the theoretical and computational framework for training models, and for prediction, in the presence of prevalence bias. Concretely a bias-corrected loss function, as well as bias-corrected predictive rules, are derived under the principles of Bayesian risk minimization. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · AI in cancer detection · Machine Learning and Data Classification