MoCA: Multi-modal Cross-masked Autoencoder for Digital Health Measurements
Howon Ryu, Yuliang Chen, Yacun Wang, Andrea Z. LaCroix, Chongzhi Di, Loki Natarajan, Yu Wang, Jingjing Zou

TL;DR
MoCA is a self-supervised multi-modal autoencoder that leverages cross-modality correlations for improved digital health data analysis, addressing missing data and unlabeled signals with theoretical insights.
Contribution
The paper introduces MoCA, a novel transformer-based framework with a cross-modality masking scheme, and establishes a theoretical link to kernel CCA for multi-modal wearable data analysis.
Findings
Enhanced reconstruction and classification performance on benchmark datasets
Effective handling of missing modalities in multi-modal data
Theoretical connection between MAE loss and kernel CCA
Abstract
Wearable devices enable continuous multi-modal physiological and behavioral monitoring, yet analysis of these data streams faces fundamental challenges including the lack of gold-standard labels and incomplete sensor data. While self-supervised learning approaches have shown promise for addressing these issues, existing multi-modal extensions present opportunities to better leverage the rich temporal and cross-modal correlations inherent in simultaneously recorded wearable sensor data. We propose the Multi-modal Cross-masked Autoencoder (MoCA), a self-supervised learning framework that combines transformer architecture with masked autoencoder (MAE) methodology, using a principled cross-modality masking scheme that explicitly leverages correlation structures between sensor modalities. MoCA demonstrates strong performance boosts across reconstruction and downstream classification tasks on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · Artificial Intelligence in Healthcare · Context-Aware Activity Recognition Systems
MethodsAbsolute Position Encodings · Layer Normalization · Byte Pair Encoding · Label Smoothing · Softmax · Dropout · Dense Connections · Transformer
