Majorization-Guided Test-Time Adaptation for Vision-Language Models under Modality-Specific Shift

Lixian Chen; Yanhui Chen; Junyi Lin

arXiv:2604.24602·cs.CV·May 5, 2026

Majorization-Guided Test-Time Adaptation for Vision-Language Models under Modality-Specific Shift

Lixian Chen, Yanhui Chen, Junyi Lin

PDF

TL;DR

This paper introduces MG-MTTA, a test-time adaptation method for vision-language models that manages modality reliability to improve accuracy under modality-specific shifts.

Contribution

It proposes a novel majorization-based approach that constrains adaptation to address asymmetric modality shifts without altering the backbone.

Findings

01

MG-MTTA improves top-1 accuracy on ImageNet-based benchmarks under semantic and joint shifts.

02

The method maintains competitive performance in visual-only settings.

03

Analysis provides conditions for entropy minimization to preserve correct modality ranking.

Abstract

Vision-language models transfer well in zero-shot settings, but at deployment the visual and textual branches often shift asymmetrically. Under this condition, entropy-based test-time adaptation can sharpen the fused posterior while increasing error, because an unreliable modality may still dominate fusion. We study this failure mode through a majorization view of multimodal posteriors and cast adaptation as a constrained de-mixing problem on the fused prediction. Based on this view, we propose MG-MTTA, which keeps the backbone frozen and updates only a lightweight gate or adapter. The objective combines fused-posterior entropy minimization with a reliability-aware gate prior built from anchor-based modality consistency and cross-modal conflict. Our analysis gives conditions under which entropy reduction preserves the correct ranking and a threshold that characterizes modality-dominance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.