Bridging Modalities via Progressive Re-alignment for Multimodal Test-Time Adaptation

Jiacheng Li; Songhe Feng

arXiv:2511.22862·cs.LG·March 24, 2026

Bridging Modalities via Progressive Re-alignment for Multimodal Test-Time Adaptation

Jiacheng Li, Songhe Feng

PDF

Open Access 1 Video

TL;DR

This paper introduces BriMPR, a novel framework for multimodal test-time adaptation that progressively re-aligns features across modalities to handle distribution shifts, improving model robustness in complex multimodal scenarios.

Contribution

The paper proposes a divide-and-conquer approach with prompt tuning and contrastive learning to effectively address multimodal distribution shifts during test-time adaptation.

Findings

01

Outperforms existing TTA methods on multiple benchmarks

02

Effectively aligns unimodal and cross-modal features

03

Enhances robustness to real-world domain shifts

Abstract

Test-time adaptation (TTA) enables online model adaptation using only unlabeled test data, aiming to bridge the gap between source and target distributions. However, in multimodal scenarios, varying degrees of distribution shift across different modalities give rise to a complex coupling effect of unimodal shallow feature shift and cross-modal high-level semantic misalignment, posing a major obstacle to extending existing TTA methods to the multimodal field. To address this challenge, we propose a novel multimodal test-time adaptation (MMTTA) framework, termed as Bridging Modalities via Progressive Re-alignment (BriMPR). BriMPR, consisting of two progressively enhanced modules, tackles the coupling effect with a divide-and-conquer strategy. Specifically, we first decompose MMTTA into multiple unimodal feature alignment sub-problems. By leveraging the strong function approximation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Bridging Modalities via Progressive Re-alignment for Multimodal Test-Time Adaptation· underline

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Speech Recognition and Synthesis · Topic Modeling