Contribution of Data Categories to Readmission Prediction Accuracy
Wendong Ge, Hee Yeun Kim, Sonali Desai, Leonid Perlovsky, Alexander, Turchin

TL;DR
This study evaluates the impact of various data categories on the accuracy of readmission prediction models, highlighting diagnosis groups and discharge disposition as key contributors.
Contribution
It provides a detailed analysis of the relative importance of 90,101 variables across multiple data categories in predicting hospital readmissions.
Findings
Diagnosis related groups significantly improve model accuracy
Discharge disposition is a strong predictor of readmission
Top contributing variables vary across data categories
Abstract
Identification of patients at high risk for readmission could help reduce morbidity and mortality as well as healthcare costs. Most of the existing studies on readmission prediction did not compare the contribution of data categories. In this study we analyzed relative contribution of 90,101 variables across 398,884 admission records corresponding to 163,468 patients, including patient demographics, historical hospitalization information, discharge disposition, diagnoses, procedures, medications and laboratory test results. We established an interpretable readmission prediction model based on Logistic Regression in scikit-learn, and added the available variables to the model one by one in order to analyze the influences of individual data categories on readmission prediction accuracy. Diagnosis related groups (c-statistic increment of 0.0933) and discharge disposition (c-statistic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHeart Failure Treatment and Management · Context-Aware Activity Recognition Systems · Machine Learning in Healthcare
