CapS-Adapter: Caption-based MultiModal Adapter in Zero-Shot   Classification

Qijie Wang; Guandu Liu; Bin Wang

arXiv:2405.16591·cs.CV·November 8, 2024

CapS-Adapter: Caption-based MultiModal Adapter in Zero-Shot Classification

Qijie Wang, Guandu Liu, Bin Wang

PDF

Open Access 1 Repo

TL;DR

CapS-Adapter introduces a caption-based multimodal support set method that leverages image and caption features to significantly improve zero-shot classification accuracy across diverse datasets without additional training.

Contribution

This work presents CapS-Adapter, a novel zero-shot classification approach using caption-based support sets to enhance generalization and performance over existing training-free methods.

Findings

01

Achieves 2.19% higher accuracy than previous state-of-the-art methods.

02

Demonstrates robust generalization across 19 benchmark datasets.

03

Effectively utilizes multimodal large models for support set construction.

Abstract

Recent advances in vision-language foundational models, such as CLIP, have demonstrated significant strides in zero-shot classification. However, the extensive parameterization of models like CLIP necessitates a resource-intensive fine-tuning process. In response, TIP-Adapter and SuS-X have introduced training-free methods aimed at bolstering the efficacy of downstream tasks. While these approaches incorporate support sets to maintain data distribution consistency between knowledge cache and test sets, they often fall short in terms of generalization on the test set, particularly when faced with test data exhibiting substantial distributional variations. In this work, we present CapS-Adapter, an innovative method that employs a caption-based support set, effectively harnessing both image and caption features to exceed existing state-of-the-art techniques in training-free scenarios.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wluli/caps-adapter
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Anomaly Detection Techniques and Applications · Natural Language Processing Techniques

MethodsContrastive Language-Image Pre-training