Purify-then-Align: Towards Robust Human Sensing under Modality Missing with Knowledge Distillation from Noisy Multimodal Teacher

Pengcheng Weng; Yanyu Qian; Yangxin Xu; Fei Wang

arXiv:2604.05584·cs.CV·April 9, 2026

Purify-then-Align: Towards Robust Human Sensing under Modality Missing with Knowledge Distillation from Noisy Multimodal Teacher

Pengcheng Weng, Yanyu Qian, Yangxin Xu, Fei Wang

PDF

TL;DR

This paper introduces PTA, a framework combining meta-learning and knowledge diffusion to improve multimodal human sensing robustness against missing modalities and noisy data.

Contribution

It presents a novel Purify-then-Align approach that dynamically down-weights noisy modalities and distills cross-modal knowledge into single-modality encoders.

Findings

01

PTA achieves state-of-the-art results on MM-Fi and XRF55 datasets.

02

PTA significantly enhances robustness of single-modality models in missing-modality scenarios.

03

The framework effectively mitigates the representation gap and contamination effects.

Abstract

Robust multimodal human sensing must overcome the critical challenge of missing modalities. Two principal barriers are the Representation Gap between heterogeneous data and the Contamination Effect from low-quality modalities. These barriers are causally linked, as the corruption introduced by contamination fundamentally impedes the reduction of representation disparities. In this paper, we propose PTA, a novel "Purify-then-Align" framework that solves this causal dependency through a synergistic integration of meta-learning and knowledge diffusion. To purify the knowledge source, PTA first employs a meta-learning-driven weighting mechanism that dynamically learns to down-weight the influence of noisy, low-contributing modalities. Subsequently, to align different modalities, PTA introduces a diffusion-based knowledge distillation paradigm in which an information-rich clean teacher,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.