Machine Learning and Statistical Insights into Hospital Stay Durations: The Italian EHR Case
Marina Andric, Mauro Dragoni

TL;DR
This study uses machine learning on Italian hospital data to identify key factors influencing length of stay and to predict hospital stay durations with notable accuracy.
Contribution
It introduces a comprehensive analysis of hospital stay factors in Italy and applies ML models to predict LoS effectively, which is novel in this healthcare context.
Findings
Significant correlations between LoS and patient age, comorbidities, admission type, and month.
CatBoost achieved an R2 of 0.49 in predicting LoS, indicating good model performance.
Machine learning models can effectively predict hospital stay durations based on patient and hospital features.
Abstract
Length of hospital stay is a critical metric for assessing healthcare quality and optimizing hospital resource management. This study aims to identify factors influencing LoS within the Italian healthcare context, using a dataset of hospitalization records from over 60 healthcare facilities in the Piedmont region, spanning from 2020 to 2023. We explored a variety of features, including patient characteristics, comorbidities, admission details, and hospital-specific factors. Significant correlations were found between LoS and features such as age group, comorbidity score, admission type, and the month of admission. Machine learning models, specifically CatBoost and Random Forest, were used to predict LoS. The highest R2 score, 0.49, was achieved with CatBoost, demonstrating good predictive performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsChronic Disease Management Strategies · Emergency and Acute Care Studies · Machine Learning in Healthcare
