Preventing dataset shift from breaking machine-learning biomarkers

J\'ero\^ome Dock\`es; Ga\"el Varoquaux (PARIETAL); Jean-Baptiste; Poline

arXiv:2107.09947·cs.LG·July 22, 2021

Preventing dataset shift from breaking machine-learning biomarkers

J\'ero\^ome Dock\`es, Ga\"el Varoquaux (PARIETAL), Jean-Baptiste, Poline

PDF

1 Repo

TL;DR

This paper discusses how dataset shifts in biomedical data can undermine machine-learning biomarkers, and reviews strategies for detecting and correcting these shifts to ensure reliable biomarker application.

Contribution

It provides an overview of the impact of dataset shifts on biomarkers and reviews methods for detection and correction to improve biomarker robustness.

Findings

01

Dataset shifts frequently occur in biomedical research.

02

Standard machine learning techniques often fail under dataset shifts.

03

Detection and correction strategies can mitigate the effects of dataset shifts.

Abstract

Machine learning brings the hope of finding new biomarkers extracted from cohorts with rich biomedical measurements. A good biomarker is one that gives reliable detection of the corresponding condition. However, biomarkers are often extracted from a cohort that differs from the target population. Such a mismatch, known as a dataset shift, can undermine the application of the biomarker to new individuals. Dataset shifts are frequent in biomedical research, e.g. because of recruitment biases. When a dataset shift occurs, standard machine-learning techniques do not suffice to extract and validate biomarkers. This article provides an overview of when and how dataset shifts breaks machine-learning extracted biomarkers, as well as detection and correction strategies.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

neurodatascience/dataset_shift_biomarkers
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.