Benchmarking AutoML Frameworks for Disease Prediction Using Medical   Claims

Roland Albert A. Romero; Mariefel Nicole Y. Deypalan; Suchit Mehrotra,; John Titus Jungao; Natalie E. Sheils; Elisabetta Manduchi; Jason H. Moore

arXiv:2107.10495·cs.LG·July 23, 2021

Benchmarking AutoML Frameworks for Disease Prediction Using Medical Claims

Roland Albert A. Romero, Mariefel Nicole Y. Deypalan, Suchit Mehrotra,, John Titus Jungao, Natalie E. Sheils, Elisabetta Manduchi, Jason H. Moore

PDF

Open Access

TL;DR

This study benchmarks AutoML frameworks on large, imbalanced healthcare datasets for disease prediction, revealing modest improvements over baseline models and highlighting challenges like data imbalance and feature limitations.

Contribution

It provides a comparative analysis of AutoML tools on medical claims data, emphasizing the need for tailored approaches to improve predictive performance in healthcare applications.

Findings

01

AutoML tools outperform baseline random forest but are similar to each other.

02

Models show low precision-recall AUC and struggle to predict positives.

03

Performance is not directly related to disease prevalence.

Abstract

We ascertain and compare the performances of AutoML tools on large, highly imbalanced healthcare datasets. We generated a large dataset using historical administrative claims including demographic information and flags for disease codes in four different time windows prior to 2019. We then trained three AutoML tools on this dataset to predict six different disease outcomes in 2019 and evaluated model performances on several metrics. The AutoML tools showed improvement from the baseline random forest model but did not differ significantly from each other. All models recorded low area under the precision-recall curve and failed to predict true positives while keeping the true negative rate high. Model performance was not directly related to prevalence. We provide a specific use-case to illustrate how to select a threshold that gives the best balance between true and false positive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Artificial Intelligence in Healthcare · Healthcare Systems and Reforms

MethodsFeature Selection