Rethinking clinical prediction: Why machine learning must consider year of care and feature aggregation
Bret Nestor, Matthew B. A. McDermott, Geeticka Chauhan, Tristan, Naumann, Michael C. Hughes, Anna Goldenberg, Marzyeh Ghassemi

TL;DR
This paper highlights the importance of incorporating the year of care and feature aggregation in machine learning models for healthcare to improve their temporal robustness and generalizability.
Contribution
It introduces a simple method of augmenting data with the year of care and demonstrates its effectiveness in maintaining model performance over time.
Findings
Models degrade by 0.3 AUC over 10 years without year information.
Aggregated features mitigate performance deterioration.
Yearly retraining with aggregated features maintains stable prediction quality.
Abstract
Machine learning for healthcare often trains models on de-identified datasets with randomly-shifted calendar dates, ignoring the fact that data were generated under hospital operation practices that change over time. These changing practices induce definitive changes in observed data which confound evaluations which do not account for dates and limit the generalisability of date-agnostic models. In this work, we establish the magnitude of this problem on MIMIC, a public hospital dataset, and showcase a simple solution. We augment MIMIC with the year in which care was provided and show that a model trained using standard feature representations will significantly degrade in quality over time. We find a deterioration of 0.3 AUC when evaluating mortality prediction on data from 10 years later. We find a similar deterioration of 0.15 AUC for length-of-stay. In contrast, we demonstrate that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Sepsis Diagnosis and Treatment · Medical Coding and Health Information
