# Deep learning algorithm for detection of acute heart failure using standard ECG waveforms

**Authors:** Sang Mee Lee, Taeyoung Kim, Mirae Shin, Jin-Oh Choi, Myung Jin Chung, Darae Kim

PMC · DOI: 10.1093/ehjdh/ztaf132 · 2025-11-10

## TL;DR

This study developed a deep learning model that can accurately detect acute heart failure using standard ECG data, showing strong performance in both internal and external validations.

## Contribution

The novel contribution is an ensemble deep learning model for acute heart failure detection using ECGs, achieving high diagnostic accuracy across diverse patient subgroups.

## Key findings

- The ensemble model achieved an AUROC of 0.997 in internal validation and 0.842 in external validation.
- The model showed consistent performance across different ejection fraction levels and demographic groups.
- False-positive cases revealed underlying cardiovascular risks, suggesting the model's potential for identifying high-risk patients.

## Abstract

To develop and evaluate a deep learning model for immediate and accurate diagnosis of acute heart failure(HF) using standard 12-lead electrocardiogram(ECG) waveforms collected from a large cohort of patients.

We retrospectively analysed patients aged > 18 years who underwent transthoracic echocardiogram, n-terminal pro-B type natriuretic peptide (NT-proBNP) evaluation, and ECG within one week of clinical diagnosis at Samsung Medical from 1 February 2011 and 31 December 2022. The cohort included 1949 acute HF patients and a control group of 24 603 patients with normal NT-proBNP levels and no significant cardiac dysfunction. Four deep learning models (1D-CNN-Res, 1D-CNN-Dense, CRT-Net without transformer, CRT-Net) and their ensemble were developed using an 8:2 stratified split, ensuring no patient overlap. An external validation was performed using MIMIC-IV dataset, which comprised 7868 acute HF patients and 16 025 controls. The performance was evaluated using the area under the receiver operating characteristic curve (AUROC), sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and F1-score. The ensemble model demonstrated the best diagnostic performance with an AUROC of 0.997 and F1-score of 0.649 and the external validation showed AUROC of 0.842 and F1-score of 0.640. Notably, F1-score indicated diagnostic performance across a diverse range of ejection fraction values and demographic subgroups. Post-hoc analysis of false-positive cases revealed underlying cardiovascular risks, highlighting the model’s utility in identifying high-risk patients.

The proposed deep learning models demonstrated remarkable performance in diagnosing acute HF. These findings support its potential utility in facilitating early diagnosis and improving clinical outcomes.

Graphical Abstract

## Full-text entities

- **Diseases:** cardiac dysfunction (MESH:D006331), acute heart failure (MESH:D006333)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12853116/full.md

---
Source: https://tomesphere.com/paper/PMC12853116