TL;DR
This paper discusses how dataset shifts in biomedical data can undermine machine-learning biomarkers, and reviews strategies for detecting and correcting these shifts to ensure reliable biomarker application.
Contribution
It provides an overview of the impact of dataset shifts on biomarkers and reviews methods for detection and correction to improve biomarker robustness.
Findings
Dataset shifts frequently occur in biomedical research.
Standard machine learning techniques often fail under dataset shifts.
Detection and correction strategies can mitigate the effects of dataset shifts.
Abstract
Machine learning brings the hope of finding new biomarkers extracted from cohorts with rich biomedical measurements. A good biomarker is one that gives reliable detection of the corresponding condition. However, biomarkers are often extracted from a cohort that differs from the target population. Such a mismatch, known as a dataset shift, can undermine the application of the biomarker to new individuals. Dataset shifts are frequent in biomedical research, e.g. because of recruitment biases. When a dataset shift occurs, standard machine-learning techniques do not suffice to extract and validate biomarkers. This article provides an overview of when and how dataset shifts breaks machine-learning extracted biomarkers, as well as detection and correction strategies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
