COMPASS: Complete Multimodal Fusion via Proxy Tokens and Shared Spaces for Ubiquitous Sensing
Hao Wang, Yanyu Qian, Pengcheng Weng, Zixuan Xia, William Dan, Yangxin Xu, Fei Wang

TL;DR
COMPASS introduces a novel multimodal fusion framework that maintains a fixed input structure using proxy tokens, enabling robust performance even with missing modalities across various sensing scenarios.
Contribution
The paper proposes a modality-complete fusion approach with proxy tokens and shared spaces, improving robustness in multimodal sensing with missing data.
Findings
COMPASS outperforms prior methods in diverse missing-modality scenarios.
The framework effectively synthesizes proxy tokens to maintain fusion completeness.
Experiments demonstrate improved robustness across multiple datasets.
Abstract
Missing modalities remain a major challenge for multimodal sensing, because most existing methods adapt the fusion process to the observed subset by dropping absent branches, using subset-specific fusion, or reconstructing missing features. As a result, the fusion head often receives an input structure different from the one seen during training, leading to incomplete fusion and degraded cross-modal interaction. We propose COMPASS, a missing-modality fusion framework built on the principle of fusion completeness: the fusion head always receives a fixed N-slot multimodal input, with one token per modality slot. For each missing modality, COMPASS synthesizes a target-specific proxy token from the observed modalities using pairwise source-to-target generators in a shared latent space, and aggregates them into a single replacement token. To make these proxies both representation-compatible…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
