RhythmBERT: A Self-Supervised Language Model Based on Latent Representations of ECG Waveforms for Heart Disease Detection
Xin Wang, Burcu Ozek, Aruna Mohan, Amirhossein Ravari, Or Zilbershot, Fatemeh Afghah

TL;DR
RhythmBERT introduces a novel self-supervised ECG language model that encodes rhythm semantics and waveform morphology, achieving high accuracy in heart disease detection with limited labeled data.
Contribution
It proposes a generative ECG language model using autoencoder-based symbolic tokens and continuous embeddings, capturing rhythm and morphology for improved cardiac analysis.
Findings
Achieves comparable or better performance than 12-lead models using only single-lead ECGs.
Effectively detects conditions like atrial fibrillation, ST-T abnormalities, and myocardial infarction.
Pretrained on 800,000 unlabeled ECGs with a masked prediction task.
Abstract
Electrocardiogram (ECG) analysis is crucial for diagnosing heart disease, but most self-supervised learning methods treat ECG as a generic time series, overlooking physiologic semantics and rhythm-level structure. Existing contrastive methods utilize augmentations that distort morphology, whereas generative approaches employ fixed-window segmentation, which misaligns cardiac cycles. To address these limitations, we propose RhythmBERT, a generative ECG language model that considers ECG as a language paradigm by encoding P, QRS, and T segments into symbolic tokens via autoencoder-based latent representations. These discrete tokens capture rhythm semantics, while complementary continuous embeddings retain fine-grained morphology, enabling a unified view of waveform structure and rhythm. RhythmBERT is pretrained on approximately 800,000 unlabeled ECG recordings with a masked prediction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsECG Monitoring and Analysis · Cardiac electrophysiology and arrhythmias · Atrial Fibrillation Management and Outcomes
