# Multimodal Transformer–Based Electrocardiogram Analysis for Cardiovascular Comorbidity Detection: Model Development and Validation Study

**Authors:** Zi Yang, Xiaojuan Wang, Jianlin Wang, Qi Guang, Xueqian Ding, Hao Liu, Yunpeng Xu, Jing Zhao, Ming Bai

PMC · DOI: 10.2196/80815 · JMIR Formative Research · 2026-01-02

## TL;DR

This paper introduces CaMPNet, a multimodal AI model that improves ECG-based detection of cardiovascular diseases using transformer technology and multiple data types.

## Contribution

The novel Cardiovascular Multimodal Prediction Network (CaMPNet) integrates raw ECG waveforms, structured features, and demographics via cross-attention fusion.

## Key findings

- CaMPNet achieved a mean AUC of 0.845 on internal tests, outperforming baselines and single-modality models.
- The model maintained moderate performance in temporal validation (mean AUC 0.715) despite distribution shifts.
- Attention-based visualizations revealed clinically interpretable patterns, such as ST-segment elevations in myocardial infarction.

## Abstract

Cardiovascular diseases remain the leading global cause of mortality, yet traditional electrocardiogram (ECG) interpretation shows subjective variability and limited sensitivity to complex pathologies.

This study aims to address these challenges by proposing the Cardiovascular Multimodal Prediction Network (CaMPNet), a transformer-based multimodal architecture that integrates raw 12-lead ECG waveforms, 9-structured machine-measured ECG features, and demographic data (age and sex) through cross-attention fusion.

The model was trained on 384,877 records from the Medical Information Mart for Intensive Care IV - Electrocardiogram Matched Subset database and evaluated across 12 cardiovascular disease labels. To further assess temporal robustness, a temporal external validation was performed using the most recent 10% of the data, withheld chronologically from model development.

On the internal test set, the model achieved a mean area under the curve (AUC) of 0.845 (SD 0.04) and area under the precision-recall curve of 0.489, outperforming the residual networks-ECG baseline (AUC=0.848 but F1-score=0.152) and all single-modality variants. Subgroup analyses demonstrated consistent performance across demographics (male AUC= 0.846 vs female=0.843; youngest quartile 0.884 vs oldest 0.811). CaMPNet retained moderate discriminative ability in temporal external validation with a mean AUC of 0.715 (SD 0.03) and area under the precision-recall curve of 0.298, although performance declined due to temporal distribution shifts. Despite this, major disease categories, such as atrial fibrillation, heart failure, and normal rhythm, maintained high AUCs (>0.84). Attention-based visualization revealed clinically interpretable patterns (eg, ST-segment elevations in ST-segment elevation myocardial infarction), and ablation experiments verified the model’s tolerance to missing structured inputs.

CaMPNet demonstrates robust and interpretable multimodal ECG-based diagnosis, offering a scalable framework for comorbidity screening and continual learning under real-world temporal dynamics.

## Linked entities

- **Diseases:** atrial fibrillation (MONDO:0004981), heart failure (MONDO:0005252)

## Full-text entities

- **Diseases:** heart failure (MESH:D006333), Cardiovascular Comorbidity (MESH:D002318), myocardial infarction (MESH:D009203), atrial fibrillation (MESH:D001281)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12758841/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12758841/full.md

## References

31 references — full list in the complete paper: https://tomesphere.com/paper/PMC12758841/full.md

---
Source: https://tomesphere.com/paper/PMC12758841