Evaluating Model Performance in Medical Datasets Over Time
Helen Zhou, Yuwen Chen, Zachary C. Lipton

TL;DR
This paper introduces the EMDOT framework for evaluating medical ML models over time, highlighting how performance varies with data recency and providing insights into model degradation in healthcare applications.
Contribution
The paper proposes the EMDOT framework for time-aware evaluation of medical models and demonstrates its utility across various datasets and model types.
Findings
Using all historical data often yields better performance.
Recent data windows can be advantageous in certain scenarios.
Sudden performance drops are analyzed for potential causes.
Abstract
Machine learning (ML) models deployed in healthcare systems must face data drawn from continually evolving environments. However, researchers proposing such models typically evaluate them in a time-agnostic manner, splitting datasets according to patients sampled randomly throughout the entire study time period. This work proposes the Evaluation on Medical Datasets Over Time (EMDOT) framework, which evaluates the performance of a model class across time. Inspired by the concept of backtesting, EMDOT simulates possible training procedures that practitioners might have been able to execute at each point in time and evaluates the resulting models on all future time points. Evaluating both linear and more complex models on six distinct medical data sources (tabular and imaging), we show how depending on the dataset, using all historical data may be ideal in many cases, whereas using a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Artificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI)
