VCR: Learning Valid Contextual Representation for Incomplete Wearable Signals
Yuxuan Weng, Wenhan Luo, Qijia Shao

TL;DR
VCR is a self-supervised framework that learns robust, valid representations from multimodal wearable signals, effectively handling missing modalities without hallucinating unobservable details, thereby improving health monitoring robustness.
Contribution
VCR introduces an orthogonal tokenizer and missing-aware mixture-of-experts to disentangle shared and modality-specific information, enhancing robustness to missing data in wearable health signals.
Findings
VCR outperforms baselines in health monitoring tasks with missing modalities.
VCR maintains high performance even with multiple missing sensors.
VCR reduces hallucination of unobservable modality-specific details.
Abstract
Wearable devices enable continuous health monitoring from multimodal signals, but real-world deployment is hindered by limited labeled data and pervasive sensor incompleteness. While large-scale self-supervised pretraining reduces label dependence, most existing methods assume full modality availability. Current approaches for handling modality missingness often reconstruct entire absent signals, which can encourage hallucinating modality-specific details that are not inferable from the observed sensor signals and degrade robustness. We propose VCR, a self-supervised framework that learns to extract valid representations robust to modality missingness. VCR employs an orthogonal tokenizer to enforce strict orthogonal disentanglement by rectifying latent manifolds and applying a geometric projection, separating each modality into shared semantics and modality-specific residuals. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
