Physiology-Aware Masked Cross-Modal Reconstruction for Biosignal Representation Learning

Hao Zhou; Simon A. Lee; Cyrus Tanade; Keum San Chun; Juhyeon Lee; Migyeong Gwak; Megha Thukral; Justin Sung; Eugene Hwang; Mehrab Bin Morshed; Li Zhu; Viswam Nathan; Md Mahbubur Rahman; Subramaniam Venkatraman; Sharanya Arcot Desai

arXiv:2605.00973·cs.LG·May 5, 2026

Physiology-Aware Masked Cross-Modal Reconstruction for Biosignal Representation Learning

Hao Zhou, Simon A. Lee, Cyrus Tanade, Keum San Chun, Juhyeon Lee, Migyeong Gwak, Megha Thukral, Justin Sung, Eugene Hwang, Mehrab Bin Morshed, Li Zhu, Viswam Nathan, Md Mahbubur Rahman, Subramaniam Venkatraman, Sharanya Arcot Desai

PDF

1 Repo 1 Models

TL;DR

This paper introduces xMAE, a biosignal pretraining framework that leverages masked cross-modal reconstruction to encode physiologically meaningful timing structures in multimodal biosignals, improving downstream task performance.

Contribution

xMAE is the first to incorporate temporal structure into multimodal biosignal pretraining, capturing directional dynamics between signals like ECG and PPG.

Findings

01

Pretraining with xMAE outperforms unimodal and multimodal baselines on 15 of 19 tasks.

02

xMAE representations generalize across devices, locations, and settings.

03

ECG-PPG timing structure is reflected in learned PPG representations.

Abstract

Biosignals acquired from different locations on the body often provide temporally ordered views of the same underlying physiological process. However, most existing self supervised learning methods treat these signals as interchangeable views, overlooking the directional temporal dynamics that link them. A canonical example is the relationship between electrocardiography (ECG), which captures the electrical activation initiating each heartbeat, and photoplethysmography (PPG), which records the resulting peripheral pulse delayed by vascular dynamics. To capture this structured relationship, we introduce xMAE, a biosignal pretraining framework that leverages masked cross modal reconstruction across temporally ordered biosignals as a training time constraint to encourage physiologically meaningful timing structure in the learned representations. We show that pretraining with xMAE yields…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hzhou3/xMAE
github

Models

🤗
itshardtogetaname/xmae
model· 36 dl· ♡ 1
36 dl♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.