A Hard-to-Beat Baseline for Training-free CLIP-based Adaptation
Zhengbo Wang, Jian Liang, Lijun Sheng, Ran He, Zilei Wang, Tieniu Tan

TL;DR
This paper introduces a training-free, Gaussian Discriminant Analysis-based method for CLIP adaptation that outperforms or matches state-of-the-art techniques across various tasks without additional training.
Contribution
It revisits classical GDA and integrates it with CLIP's zero-shot classifier, enabling effective downstream classification without training or extra resources.
Findings
Outperforms state-of-the-art on 17 datasets
Effective in few-shot, imbalanced, and OOD tasks
Extends to base-to-new generalization and unsupervised learning
Abstract
Contrastive Language-Image Pretraining (CLIP) has gained popularity for its remarkable zero-shot capacity. Recent research has focused on developing efficient fine-tuning methods, such as prompt learning and adapter, to enhance CLIP's performance in downstream tasks. However, these methods still require additional training time and computational resources, which is undesirable for devices with limited resources. In this paper, we revisit a classical algorithm, Gaussian Discriminant Analysis (GDA), and apply it to the downstream classification of CLIP. Typically, GDA assumes that features of each class follow Gaussian distributions with identical covariance. By leveraging Bayes' formula, the classifier can be expressed in terms of the class means and covariance, which can be estimated from the data without the need for training. To integrate knowledge from both visual and textual…
Peer Reviews
Decision·ICLR 2024 poster
- A simple and effective method. - Easy to implement in few lines of code. - A novel idea of following Gaussian Discriminant Analysis to improve CLIP performance on downstream tasks. - Provided ways to extend the approach to tasks like base-to-new generalization and unsupervised learning. - Shown better results among training-free methods and competitive to training based methods.
Rather than listing weakness, I would like to discuss and clarify few questions. Please see the Questions section.
- The incorporation of Gaussian Discriminant Analysis (GDA) into the CLIP model for training-free downstream tasks is interesting. - The proposed method is simple yet effective.
- Though simple yet effective, the overall novelty is limited. The proposed method simply treats the pre-trained CLIP model as a frozen feature extractor, and incorporates the idea of Gaussian Discriminant Analysis (GDA) to estimate the classifier without training for the downstream tasks. The key contribution might simply be the validation of the strong feature extraction capability of the pre-trained CLIP model. - It lacks of more insightful analyses. While the paper introduces the integration
- The proposed baseline is very simple and does not require training, which is very important in few-shot transfer scenarios where real-time response is often required. - If this baseline really has a very good performance, then it will be a good point for rethinking the whole field, which may in turn inspire future research in the right direction. - The paper is well-written and easy to understand.
- The most important weakness of this paper is a historical issue that stems from the whole literature: According to a paper also submitted to ICLR 2024 (Benchmarking Few-shot Transferability of Pre-trained Models with Improved Evaluation Protocols, https://openreview.net/pdf?id=O3Mej5jlda), the experiments conducted on the original few-shot transfer benchmark of CLIP are unreliable due to intrinsic design flaws, such as sampling uncertainty, unrealistic hyperparameter selection, etc. The perfor
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Video Coding and Compression Technologies · Subtitles and Audiovisual Media
MethodsContrastive Language-Image Pre-training
