# Cardiac health assessment across scenarios and devices using a multimodal foundation model pretrained on data from 1.7 million individuals

**Authors:** Xiao Gu, Wei Tang, Jinpei Han, Veer Sangha, Fenglin Liu, Shreyank N. Gowda, Antonio H. Ribeiro, Patrick Schwab, Kim Branson, Lei Clifton, Antonio Luiz P. Ribeiro, Zhangdaihong Liu, David A. Clifton

PMC · DOI: 10.1038/s42256-026-01180-5 · Nature Machine Intelligence · 2026-02-24

## TL;DR

This paper introduces a foundation model for cardiac health that can process heart signals and related text, working well across different devices and settings.

## Contribution

The novel contribution is a multimodal cardiac foundation model pretrained on data from 1.7 million individuals, enabling robust and generalizable cardiac monitoring.

## Key findings

- CSFM outperforms traditional methods in diagnostic tasks and adapts well to different sensor modalities.
- The model performs consistently across 12-lead and single-lead ECGs and various biosignal combinations.
- CSFM supports diverse cardiac applications like demographic recognition and clinical outcome prediction.

## Abstract

Cardiovascular diseases remain a major contributor to the global burden of healthcare, highlighting the importance of accurate and scalable methods for cardiac monitoring. Cardiac biosignals, most notably electrocardiograms (ECG) and photoplethysmograms, are essential for diagnosing, preventing and managing cardiovascular conditions across clinical and home settings. However, their acquisition varies substantially across scenarios and devices, whereas existing analytical models often rely on homogeneous datasets and static bespoke models, limiting their robustness and generalizability in diverse real-world contexts. Here we present a cardiac sensing foundation model (CSFM) that leverages transformer architectures and a generative masked pretraining strategy to learn unified representations from heterogeneous health records. CSFM is pretrained on a multimodal integration of data from various large-scale datasets, comprising cardiac signals from approximately 1.7 million individuals and their corresponding clinical or machine-generated text reports. The embeddings derived from CSFM act as effective, transferable features across diverse cardiac sensing scenarios, supporting a seamless adaptation to the varied input configurations and sensor modalities. Extensive evaluations across diagnostic tasks, demographic recognition, vital sign measurement, clinical outcome prediction and ECG question answering demonstrate that CSFM consistently outperforms traditional one-modal-one-task approaches. Notably, CSFM maintains favourable performance across both 12-lead and single-lead ECGs, as well as in scenarios involving ECG only, photoplethysmogram only or a combination of both. This highlights its potential as a versatile and scalable foundation for comprehensive cardiac monitoring.

Gu et al. introduce a cardiac foundation model that learns from millions of heart signals and textual interpretations, enabling it to handle heart data collected either in hospitals or at home. It offers clear and reliable insights across different devices and settings.

## Full-text entities

- **Diseases:** MIMIC-IV (MESH:D006011), CSFM (MESH:D006331), ischaemia (MESH:D007511), MAE (MESH:D012030), PTB-XL (MESH:D063466), MIMIC-III (MESH:C537189), Cardiovascular disease (MESH:D002318), AF (MESH:D001281)
- **Chemicals:** CSFM (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12932102/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12932102/full.md

## References

24 references — full list in the complete paper: https://tomesphere.com/paper/PMC12932102/full.md

---
Source: https://tomesphere.com/paper/PMC12932102