Benchmarking AutoML Frameworks for Disease Prediction Using Medical Claims
Roland Albert A. Romero, Mariefel Nicole Y. Deypalan, Suchit Mehrotra,, John Titus Jungao, Natalie E. Sheils, Elisabetta Manduchi, Jason H. Moore

TL;DR
This study benchmarks AutoML frameworks on large, imbalanced healthcare datasets for disease prediction, revealing modest improvements over baseline models and highlighting challenges like data imbalance and feature limitations.
Contribution
It provides a comparative analysis of AutoML tools on medical claims data, emphasizing the need for tailored approaches to improve predictive performance in healthcare applications.
Findings
AutoML tools outperform baseline random forest but are similar to each other.
Models show low precision-recall AUC and struggle to predict positives.
Performance is not directly related to disease prevalence.
Abstract
We ascertain and compare the performances of AutoML tools on large, highly imbalanced healthcare datasets. We generated a large dataset using historical administrative claims including demographic information and flags for disease codes in four different time windows prior to 2019. We then trained three AutoML tools on this dataset to predict six different disease outcomes in 2019 and evaluated model performances on several metrics. The AutoML tools showed improvement from the baseline random forest model but did not differ significantly from each other. All models recorded low area under the precision-recall curve and failed to predict true positives while keeping the true negative rate high. Model performance was not directly related to prevalence. We provide a specific use-case to illustrate how to select a threshold that gives the best balance between true and false positive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Artificial Intelligence in Healthcare · Healthcare Systems and Reforms
MethodsFeature Selection
