Purify-then-Align: Towards Robust Human Sensing under Modality Missing with Knowledge Distillation from Noisy Multimodal Teacher
Pengcheng Weng, Yanyu Qian, Yangxin Xu, Fei Wang

TL;DR
This paper introduces PTA, a framework combining meta-learning and knowledge diffusion to improve multimodal human sensing robustness against missing modalities and noisy data.
Contribution
It presents a novel Purify-then-Align approach that dynamically down-weights noisy modalities and distills cross-modal knowledge into single-modality encoders.
Findings
PTA achieves state-of-the-art results on MM-Fi and XRF55 datasets.
PTA significantly enhances robustness of single-modality models in missing-modality scenarios.
The framework effectively mitigates the representation gap and contamination effects.
Abstract
Robust multimodal human sensing must overcome the critical challenge of missing modalities. Two principal barriers are the Representation Gap between heterogeneous data and the Contamination Effect from low-quality modalities. These barriers are causally linked, as the corruption introduced by contamination fundamentally impedes the reduction of representation disparities. In this paper, we propose PTA, a novel "Purify-then-Align" framework that solves this causal dependency through a synergistic integration of meta-learning and knowledge diffusion. To purify the knowledge source, PTA first employs a meta-learning-driven weighting mechanism that dynamically learns to down-weight the influence of noisy, low-contributing modalities. Subsequently, to align different modalities, PTA introduces a diffusion-based knowledge distillation paradigm in which an information-rich clean teacher,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
