Wavelet-Driven Masked Multiscale Reconstruction for PPG Foundation Models
Megha Thukral, Cyrus Tanade, Simon A. Lee, Juhyeon Lee, Hao Zhou, Keum San Chun, Migyeong Gwak, Viswam Nathan, Md Mahbubur Rahman, Li Zhu, Mehrab Bin Morshed, Subramaniam Venkatraman, Sharanya Arcot Desai

TL;DR
This paper introduces a wavelet-based self-supervised pretraining method for PPG signals that captures multi-resolution features across time and frequency, improving performance on diverse health tasks.
Contribution
It presents Masked Multiscale Reconstruction (MMR), a novel framework that leverages wavelet decomposition for hierarchical PPG representation learning, outperforming existing models.
Findings
MMR improves performance on 17 of 19 health tasks.
Wavelet-based features capture physiologically-grounded information.
Pretraining on large-scale data enhances model generalization.
Abstract
Wearable foundation models have the potential to transform digital health by learning transferable representations from large-scale biosignals collected in everyday settings. While recent progress has been made in large-scale pretraining, most approaches overlook the spectral structure of photoplethysmography (PPG) signals, wherein physiological rhythms unfold across multiple frequency bands. Motivated by the insight that many downstream health-related tasks depend on multi-resolution features spanning fine-grained waveform morphology to global rhythmic dynamics, we introduce Masked Multiscale Reconstruction (MMR) for PPG representation learning - a self-supervised pretraining framework that explicitly learns from hierarchical time-frequency scales of PPG data. The pretraining task is designed to reconstruct randomly masked out coefficients obtained from a wavelet-based multiresolution…
Peer Reviews
Decision·Submitted to ICLR 2026
This paper tests 13 different downstream tasks spanning cardiovascular conditions (hypertension, PVC), metabolic markers (creatinine), and electrolyte imbalances, and provides strong evidence that the learned representations capture broadly useful information.
1. The core technical contribution lacks clear validation. While wavelets are positioned as the main innovation, the paper never isolates whether DWT actually drives the performance gains. Critically, the paper is missing the essential ablation: MMR with DWT versus MMR without DWT—the same masked autoencoder architecture and training procedure applied to patchified raw PPG time series instead of wavelet coefficients. 2. Results are mixed and claims are overstated. The abstract and conclusions d
1. The use of wavelet-based multiscale reconstruction as a masked modeling target is a strong conceptual contribution. 2. The diversity of the dataset, with 1h30 of data for each patient in unconstrained environments significantly increases applicability over prior foundation models trained on clean, clinical datasets. 3. The data pre-processing has a good balance between cleaning and maintaining as much data as possible. 4. The paper benchmarks across 13 diverse downstream tasks (clinical and p
1. While effective, MMR’s novelty lies mainly in applying masked reconstruction to wavelet coefficients. The method reuses a ViT backbone with minimal architectural innovations. 2. Key preprocessing and DWT hyperparameters (e.g., sampling-rate normalization, interpolation scheme for coefficients) could be better detailed for reproducibility. 3. The baselines could be newer models (eg Chronos-Bolt instead of Chronos) 4. Clarity on the fixed parameters in the ablation study could be improved (what
- The idea of learning PPG representations through the reconstruction of masked DWT coefficients is quite interesting. - The paper includes thorough experiments, with useful case studies and ablation analyses beyond standard downstream evaluations. - The paper is well-written and easy to follow.
**Evaluation:** The diversity and number of devices used are essential for interpreting the results, and reporting these details would not compromise anonymity. However, the use of a closed-source dataset limits the interpretability and reproducibility of the findings. Specifically: (1) the number of datasets from which each downstream task is derived remains unclear, and (2) it is uncertain whether the training and test data originate from the same devices. Furthermore, several public PPG datas
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNon-Invasive Vital Sign Monitoring · Optical Imaging and Spectroscopy Techniques · Emotion and Mood Recognition
