# Predicting 30-day and 1-year mortality in heart failure with preserved ejection fraction (HFpEF)

**Authors:** Ikgyu Shin, Nilay Bhatt, Alaa Alashi, Keervani Kandala, Karthik Murugiah, Shukri AlSaif, Shukri AlSaif, Shukri AlSaif, Shukri AlSaif

PMC · DOI: 10.1371/journal.pone.0336809 · 2025-11-14

## TL;DR

This study builds and compares models to predict short- and long-term mortality in patients with heart failure and preserved ejection fraction using electronic health records.

## Contribution

The study introduces and evaluates multiple machine learning models for predicting mortality in HFpEF using real-world EHR data.

## Key findings

- Logistic regression achieved the best performance for 30-day mortality prediction with an AUC of 0.83.
- Random Forest and Histogram-based Gradient Boosting Classifier performed best for 1-year mortality prediction.
- Age and NT-proBNP were identified as the strongest predictors for both 30-day and 1-year mortality.

## Abstract

To develop and compare prediction models for 30-day and 1-year mortality in Heart failure with preserved ejection fraction (HFpEF) using EHR data, utilizing both traditional and machine learning (ML) techniques.

HFpEF represents 1 in 2 heart failure patients. Predictive models in HFpEF, specifically those derived from electronic health record (EHR) data, are less established.

Using MIMIC-IV EHR data from 2008−2019, patients aged ≥ 18 years admitted with a primary diagnosis of HFpEF were identified using ICD-9 and 10 codes. Demographics, vital signs, prior diagnoses, and lab data were extracted. Data was partitioned into 80% training, 20% test sets. Prediction models from seven model classes (Support Vector Classifier (SVC), Logistic Regression, Lasso Regression, Elastic Net, Random Forest, Histogram-based Gradient Boosting Classifier (HGBC), and eXtreme Gradient Boosting (XGBoost)) were developed using various imputation and oversampling techniques with 5-fold cross-validation. Model performance was compared using several metrics, and individual feature importance assessed using SHapley Additive exPlanations (SHAP) analysis.

Among 3,235 hospitalizations for HFpEF, 30-day mortality was 6.3%, and 1- year mortality was 29.2%. Logistic regression performed well for 30-day mortality (Area Under the Receiver operating characteristic curve (AUC) 0.83), whereas Random Forest (AUC 0.79) and HGBC (AUC 0.78) for 1-year mortality. Age and NT-proBNP were the strongest predictors in SHAP analyses for both outcomes.

Models derived from EHR data can predict mortality after HFpEF hospitalization with comparable performance to models derived from registry or trial data, highlighting the potential for clinical implementation.

## Linked entities

- **Diseases:** heart failure (MONDO:0005252)

## Full-text entities

- **Diseases:** Heart failure (MESH:D006333), HGBC (MESH:D000141)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12617840/full.md

---
Source: https://tomesphere.com/paper/PMC12617840