Data-Driven Machine Learning Approaches for Predicting In-Hospital Sepsis Mortality
Arseniy Shumilov, Yueting Zhu, Negin Ashrafi, Armin Abdollahi, Greg, Placencia, Kamiar Alaei, Maryam Pishgar

TL;DR
This study develops an interpretable machine learning model using ICU data to accurately predict in-hospital sepsis mortality, addressing previous limitations in feature selection and interpretability.
Contribution
The paper introduces a novel, highly accurate Random Forest model with optimized feature selection for predicting sepsis mortality in hospital settings.
Findings
Random Forest achieved 0.90 accuracy and 0.97 AUROC.
Top 35 features identified improve model interpretability.
Model outperforms other machine learning algorithms.
Abstract
Sepsis is a severe condition responsible for many deaths in the United States and worldwide, making accurate prediction of outcomes crucial for timely and effective treatment. Previous studies employing machine learning faced limitations in feature selection and model interpretability, reducing their clinical applicability. This research aimed to develop an interpretable and accurate machine learning model to predict in-hospital sepsis mortality, addressing these gaps. Using ICU patient records from the MIMIC-III database, we extracted relevant data through a combination of literature review, clinical input refinement, and Random Forest-based feature selection, identifying the top 35 features. Data preprocessing included cleaning, imputation, standardization, and applying the Synthetic Minority Over-sampling Technique (SMOTE) to address class imbalance, resulting in a dataset of 4,683…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare
MethodsSupport Vector Machine · Feature Selection · Logistic Regression · Synthetic Minority Over-sampling Technique.
