A Hard-to-Beat Baseline for Training-free CLIP-based Adaptation

Zhengbo Wang; Jian Liang; Lijun Sheng; Ran He; Zilei Wang; Tieniu Tan

arXiv:2402.04087·cs.CV·February 7, 2024·1 cites

A Hard-to-Beat Baseline for Training-free CLIP-based Adaptation

Zhengbo Wang, Jian Liang, Lijun Sheng, Ran He, Zilei Wang, Tieniu Tan

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces a training-free, Gaussian Discriminant Analysis-based method for CLIP adaptation that outperforms or matches state-of-the-art techniques across various tasks without additional training.

Contribution

It revisits classical GDA and integrates it with CLIP's zero-shot classifier, enabling effective downstream classification without training or extra resources.

Findings

01

Outperforms state-of-the-art on 17 datasets

02

Effective in few-shot, imbalanced, and OOD tasks

03

Extends to base-to-new generalization and unsupervised learning

Abstract

Contrastive Language-Image Pretraining (CLIP) has gained popularity for its remarkable zero-shot capacity. Recent research has focused on developing efficient fine-tuning methods, such as prompt learning and adapter, to enhance CLIP's performance in downstream tasks. However, these methods still require additional training time and computational resources, which is undesirable for devices with limited resources. In this paper, we revisit a classical algorithm, Gaussian Discriminant Analysis (GDA), and apply it to the downstream classification of CLIP. Typically, GDA assumes that features of each class follow Gaussian distributions with identical covariance. By leveraging Bayes' formula, the classifier can be expressed in terms of the class means and covariance, which can be estimated from the data without the need for training. To integrate knowledge from both visual and textual…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

- A simple and effective method. - Easy to implement in few lines of code. - A novel idea of following Gaussian Discriminant Analysis to improve CLIP performance on downstream tasks. - Provided ways to extend the approach to tasks like base-to-new generalization and unsupervised learning. - Shown better results among training-free methods and competitive to training based methods.

Weaknesses

Rather than listing weakness, I would like to discuss and clarify few questions. Please see the Questions section.

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

- The incorporation of Gaussian Discriminant Analysis (GDA) into the CLIP model for training-free downstream tasks is interesting. - The proposed method is simple yet effective.

Weaknesses

- Though simple yet effective, the overall novelty is limited. The proposed method simply treats the pre-trained CLIP model as a frozen feature extractor, and incorporates the idea of Gaussian Discriminant Analysis (GDA) to estimate the classifier without training for the downstream tasks. The key contribution might simply be the validation of the strong feature extraction capability of the pre-trained CLIP model. - It lacks of more insightful analyses. While the paper introduces the integration

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 5

Strengths

- The proposed baseline is very simple and does not require training, which is very important in few-shot transfer scenarios where real-time response is often required. - If this baseline really has a very good performance, then it will be a good point for rethinking the whole field, which may in turn inspire future research in the right direction. - The paper is well-written and easy to understand.

Weaknesses

- The most important weakness of this paper is a historical issue that stems from the whole literature: According to a paper also submitted to ICLR 2024 (Benchmarking Few-shot Transferability of Pre-trained Models with Improved Evaluation Protocols, https://openreview.net/pdf?id=O3Mej5jlda), the experiments conducted on the original few-shot transfer benchmark of CLIP are unreliable due to intrinsic design flaws, such as sampling uncertainty, unrealistic hyperparameter selection, etc. The perfor

Code & Models

Repositories

mrflogs/iclr24
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Video Coding and Compression Technologies · Subtitles and Audiovisual Media

MethodsContrastive Language-Image Pre-training