Rethinking Few Shot CLIP Benchmarks: A Critical Analysis in the Inductive Setting
Alexey Kravets, Da Chen, Vinay P. Namboodiri

TL;DR
This paper critically examines the evaluation of CLIP in few-shot learning, introduces an unlearning-based pipeline for true inductive benchmarking, and proposes an improved classification method that outperforms existing baselines across diverse datasets.
Contribution
It highlights the inadequacy of standard benchmarks for CLIP's inductive generalization, introduces an unlearning technique for proper evaluation, and presents a new method achieving state-of-the-art results.
Findings
Performance drops by 55% in true inductive setting
Unlearning technique provides more accurate baseline evaluation
Proposed method outperforms 13 recent baselines across datasets
Abstract
CLIP is a foundational model with transferable classification performance in the few-shot setting. Several methods have shown improved performance of CLIP using few-shot examples. However, so far, all these techniques have been benchmarked using standard few-shot datasets. We argue that this mode of evaluation does not provide a true indication of the inductive generalization ability using few-shot examples. As most datasets have been seen by the CLIP model, the resultant setting can be termed as partially transductive. To solve this, we propose a pipeline that uses an unlearning technique to obtain true inductive baselines. In this new inductive setting, the methods show a significant drop in performance (-55% on average among 13 baselines with multiple datasets). We validate the unlearning technique using oracle baselines. An improved few-shot classification technique is proposed that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
