Validity problems in clinical machine learning by indirect data labeling using consensus definitions
Michael Hagmann, Shigehiko Schamoni, Stefan Riezler

TL;DR
This paper highlights a fundamental validity issue in clinical machine learning where models trained on indirectly labeled data fail in real-world scenarios, and proposes a detection procedure to identify such problems.
Contribution
It introduces a general method to detect datasets and models affected by indirect labeling issues in clinical machine learning applications.
Findings
Models trained on indirect labels learn to reconstruct target definitions only.
Such models perform perfectly on similar test data but fail in real-world settings.
A detection procedure can identify problematic datasets and models.
Abstract
We demonstrate a validity problem of machine learning in the vital application area of disease diagnosis in medicine. It arises when target labels in training data are determined by an indirect measurement, and the fundamental measurements needed to determine this indirect measurement are included in the input data representation. Machine learning models trained on this data will learn nothing else but to exactly reconstruct the known target definition. Such models show perfect performance on similarly constructed test data but will fail catastrophically on real-world examples where the defining fundamental measurements are not or only incompletely available. We present a general procedure allowing identification of problematic datasets and black-box machine learning models trained on them, and exemplify our detection procedure on the task of early prediction of sepsis.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Machine Learning in Healthcare · Sepsis Diagnosis and Treatment
