ECG-Byte: A Tokenizer for End-to-End Generative Electrocardiogram Language Modeling

William Han; Chaojing Duan; Michael A. Rosenberg; Emerson Liu; Ding Zhao

arXiv:2412.14373·cs.CL·July 31, 2025

ECG-Byte: A Tokenizer for End-to-End Generative Electrocardiogram Language Modeling

William Han, Chaojing Duan, Michael A. Rosenberg, Emerson Liu, Ding Zhao

PDF

Open Access 2 Repos

TL;DR

ECG-Byte introduces a novel byte pair encoding tokenizer for ECG signals, enabling efficient end-to-end language modeling that improves interpretability and reduces training time and data requirements.

Contribution

The paper presents ECG-Byte, a new BPE-based tokenizer that allows direct end-to-end ECG language modeling, overcoming inefficiencies of previous two-stage approaches.

Findings

01

Training is 3 times faster.

02

Uses 48% less data.

03

Achieves competitive NLG performance.

Abstract

Large Language Models (LLMs) have demonstrated exceptional versatility across domains, including applications to electrocardiograms (ECGs). A growing body of work focuses on generating text from multi-channeled ECG signals and corresponding textual prompts. Existing approaches often involve a two-stage process: pretraining an ECG-specific encoder with a self-supervised learning (SSL) objective, followed by finetuning an LLM for natural language generation (NLG) using encoder-derived features. However, these methods face two key limitations: inefficiency due to multi-stage training and challenges in interpreting encoder-generated features. To overcome these issues, we propose ECG-Byte, an adapted byte pair encoding (BPE) tokenizer pipeline for autoregressive language modeling of ECGs. ECG-Byte compresses and encodes ECG signals into tokens, enabling direct end-to-end LLM training by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsECG Monitoring and Analysis · Business Process Modeling and Analysis