Bridging Modalities via Progressive Re-alignment for Multimodal Test-Time Adaptation
Jiacheng Li, Songhe Feng

TL;DR
This paper introduces BriMPR, a novel framework for multimodal test-time adaptation that progressively re-aligns features across modalities to handle distribution shifts, improving model robustness in complex multimodal scenarios.
Contribution
The paper proposes a divide-and-conquer approach with prompt tuning and contrastive learning to effectively address multimodal distribution shifts during test-time adaptation.
Findings
Outperforms existing TTA methods on multiple benchmarks
Effectively aligns unimodal and cross-modal features
Enhances robustness to real-world domain shifts
Abstract
Test-time adaptation (TTA) enables online model adaptation using only unlabeled test data, aiming to bridge the gap between source and target distributions. However, in multimodal scenarios, varying degrees of distribution shift across different modalities give rise to a complex coupling effect of unimodal shallow feature shift and cross-modal high-level semantic misalignment, posing a major obstacle to extending existing TTA methods to the multimodal field. To address this challenge, we propose a novel multimodal test-time adaptation (MMTTA) framework, termed as Bridging Modalities via Progressive Re-alignment (BriMPR). BriMPR, consisting of two progressively enhanced modules, tackles the coupling effect with a divide-and-conquer strategy. Specifically, we first decompose MMTTA into multiple unimodal feature alignment sub-problems. By leveraging the strong function approximation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Speech Recognition and Synthesis · Topic Modeling
