Wavelet-Driven Masked Multiscale Reconstruction for PPG Foundation Models

Megha Thukral; Cyrus Tanade; Simon A. Lee; Juhyeon Lee; Hao Zhou; Keum San Chun; Migyeong Gwak; Viswam Nathan; Md Mahbubur Rahman; Li Zhu; Mehrab Bin Morshed; Subramaniam Venkatraman; Sharanya Arcot Desai

arXiv:2601.12215·cs.LG·January 21, 2026

Wavelet-Driven Masked Multiscale Reconstruction for PPG Foundation Models

Megha Thukral, Cyrus Tanade, Simon A. Lee, Juhyeon Lee, Hao Zhou, Keum San Chun, Migyeong Gwak, Viswam Nathan, Md Mahbubur Rahman, Li Zhu, Mehrab Bin Morshed, Subramaniam Venkatraman, Sharanya Arcot Desai

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a wavelet-based self-supervised pretraining method for PPG signals that captures multi-resolution features across time and frequency, improving performance on diverse health tasks.

Contribution

It presents Masked Multiscale Reconstruction (MMR), a novel framework that leverages wavelet decomposition for hierarchical PPG representation learning, outperforming existing models.

Findings

01

MMR improves performance on 17 of 19 health tasks.

02

Wavelet-based features capture physiologically-grounded information.

03

Pretraining on large-scale data enhances model generalization.

Abstract

Wearable foundation models have the potential to transform digital health by learning transferable representations from large-scale biosignals collected in everyday settings. While recent progress has been made in large-scale pretraining, most approaches overlook the spectral structure of photoplethysmography (PPG) signals, wherein physiological rhythms unfold across multiple frequency bands. Motivated by the insight that many downstream health-related tasks depend on multi-resolution features spanning fine-grained waveform morphology to global rhythmic dynamics, we introduce Masked Multiscale Reconstruction (MMR) for PPG representation learning - a self-supervised pretraining framework that explicitly learns from hierarchical time-frequency scales of PPG data. The pretraining task is designed to reconstruct randomly masked out coefficients obtained from a wavelet-based multiresolution…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 5

Strengths

This paper tests 13 different downstream tasks spanning cardiovascular conditions (hypertension, PVC), metabolic markers (creatinine), and electrolyte imbalances, and provides strong evidence that the learned representations capture broadly useful information.

Weaknesses

1. The core technical contribution lacks clear validation. While wavelets are positioned as the main innovation, the paper never isolates whether DWT actually drives the performance gains. Critically, the paper is missing the essential ablation: MMR with DWT versus MMR without DWT—the same masked autoencoder architecture and training procedure applied to patchified raw PPG time series instead of wavelet coefficients. 2. Results are mixed and claims are overstated. The abstract and conclusions d

Reviewer 02Rating 6Confidence 4

Strengths

1. The use of wavelet-based multiscale reconstruction as a masked modeling target is a strong conceptual contribution. 2. The diversity of the dataset, with 1h30 of data for each patient in unconstrained environments significantly increases applicability over prior foundation models trained on clean, clinical datasets. 3. The data pre-processing has a good balance between cleaning and maintaining as much data as possible. 4. The paper benchmarks across 13 diverse downstream tasks (clinical and p

Weaknesses

1. While effective, MMR’s novelty lies mainly in applying masked reconstruction to wavelet coefficients. The method reuses a ViT backbone with minimal architectural innovations. 2. Key preprocessing and DWT hyperparameters (e.g., sampling-rate normalization, interpolation scheme for coefficients) could be better detailed for reproducibility. 3. The baselines could be newer models (eg Chronos-Bolt instead of Chronos) 4. Clarity on the fixed parameters in the ablation study could be improved (what

Reviewer 03Rating 4Confidence 3

Strengths

- The idea of learning PPG representations through the reconstruction of masked DWT coefficients is quite interesting. - The paper includes thorough experiments, with useful case studies and ablation analyses beyond standard downstream evaluations. - The paper is well-written and easy to follow.

Weaknesses

**Evaluation:** The diversity and number of devices used are essential for interpreting the results, and reporting these details would not compromise anonymity. However, the use of a closed-source dataset limits the interpretability and reproducibility of the findings. Specifically: (1) the number of datasets from which each downstream task is derived remains unclear, and (2) it is uncertain whether the training and test data originate from the same devices. Furthermore, several public PPG datas

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNon-Invasive Vital Sign Monitoring · Optical Imaging and Spectroscopy Techniques · Emotion and Mood Recognition