BaFTA: Backprop-Free Test-Time Adaptation For Zero-Shot Vision-Language   Models

Xuefeng Hu; Ke Zhang; Min Sun; Albert Chen; Cheng-Hao Kuo; Ram; Nevatia

arXiv:2406.11309·cs.CV·June 19, 2024

BaFTA: Backprop-Free Test-Time Adaptation For Zero-Shot Vision-Language Models

Xuefeng Hu, Ke Zhang, Min Sun, Albert Chen, Cheng-Hao Kuo, Ram, Nevatia

PDF

Open Access

TL;DR

BaFTA is a novel backpropagation-free method for test-time adaptation of vision-language models like CLIP, using online clustering and entropy-based reliability to improve zero-shot image classification without fine-tuning.

Contribution

It introduces a backpropagation-free algorithm that estimates class centroids via online clustering, avoiding fine-tuning and improving adaptation performance.

Findings

01

BaFTA outperforms existing methods in accuracy.

02

BaFTA is more efficient due to no backpropagation.

03

It effectively combines multiple predictions for robustness.

Abstract

Large-scale pretrained vision-language models like CLIP have demonstrated remarkable zero-shot image classification capabilities across diverse domains. To enhance CLIP's performance while preserving the zero-shot paradigm, various test-time prompt tuning methods have been introduced to refine class embeddings through unsupervised learning objectives during inference. However, these methods often encounter challenges in selecting appropriate learning rates to prevent collapsed training in the absence of validation data during test-time adaptation. In this study, we propose a novel backpropagation-free algorithm BaFTA for test-time adaptation of vision-language models. Instead of fine-tuning text prompts to refine class embeddings, our approach directly estimates class centroids using online clustering within a projected embedding space that aligns text and visual embeddings. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · COVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning

MethodsContrastive Language-Image Pre-training