Test-time Alignment-Enhanced Adapter for Vision-Language Models

Baoshun Tong; Kaiyu Song; Hanjiang Lai

arXiv:2411.15735·cs.CV·November 26, 2024

Test-time Alignment-Enhanced Adapter for Vision-Language Models

Baoshun Tong, Kaiyu Song, Hanjiang Lai

PDF

Open Access 1 Repo

TL;DR

This paper introduces TAEA, a test-time adaptation method for vision-language models that trains an adapter to improve text features during testing, enhancing alignment and performance under distribution shifts.

Contribution

The paper proposes a novel test-time adapter that adjusts text features in VLMs, incorporating negative cache for improved alignment and outperformance of existing methods.

Findings

01

Outperforms state-of-the-art TTA methods by 0.75% on out-of-distribution benchmarks.

02

Achieves 2.5% improvement on cross-domain benchmarks.

03

Maintains acceptable training time for test-time adaptation.

Abstract

Test-time adaptation with pre-trained vision-language models (VLMs) has attracted increasing attention for tackling the issue of distribution shift during the test phase. While prior methods have shown effectiveness in addressing distribution shift by adjusting classification logits, they are not optimal due to keeping text features unchanged. To address this issue, we introduce a new approach called Test-time Alignment-Enhanced Adapter (TAEA), which trains an adapter with test samples to adjust text features during the test phase. We can enhance the text-to-image alignment prediction by utilizing an adapter to adapt text features. Furthermore, we also propose to adopt the negative cache from TDA as enhancement module, which further improves the performance of TAEA. Our approach outperforms the state-of-the-art TTA method of pre-trained VLMs by an average of 0.75% on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

BaoshunWq/clip_TAEA
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques

MethodsSoftmax · Attention Is All You Need · Adapter · ADaptive gradient method with the OPTimal convergence rate