COMPASS: Complete Multimodal Fusion via Proxy Tokens and Shared Spaces for Ubiquitous Sensing

Hao Wang; Yanyu Qian; Pengcheng Weng; Zixuan Xia; William Dan; Yangxin Xu; Fei Wang

arXiv:2604.02056·cs.CV·April 3, 2026

COMPASS: Complete Multimodal Fusion via Proxy Tokens and Shared Spaces for Ubiquitous Sensing

Hao Wang, Yanyu Qian, Pengcheng Weng, Zixuan Xia, William Dan, Yangxin Xu, Fei Wang

PDF

TL;DR

COMPASS introduces a novel multimodal fusion framework that maintains a fixed input structure using proxy tokens, enabling robust performance even with missing modalities across various sensing scenarios.

Contribution

The paper proposes a modality-complete fusion approach with proxy tokens and shared spaces, improving robustness in multimodal sensing with missing data.

Findings

01

COMPASS outperforms prior methods in diverse missing-modality scenarios.

02

The framework effectively synthesizes proxy tokens to maintain fusion completeness.

03

Experiments demonstrate improved robustness across multiple datasets.

Abstract

Missing modalities remain a major challenge for multimodal sensing, because most existing methods adapt the fusion process to the observed subset by dropping absent branches, using subset-specific fusion, or reconstructing missing features. As a result, the fusion head often receives an input structure different from the one seen during training, leading to incomplete fusion and degraded cross-modal interaction. We propose COMPASS, a missing-modality fusion framework built on the principle of fusion completeness: the fusion head always receives a fixed N-slot multimodal input, with one token per modality slot. For each missing modality, COMPASS synthesizes a target-specific proxy token from the observed modalities using pairwise source-to-target generators in a shared latent space, and aggregates them into a single replacement token. To make these proxies both representation-compatible…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.