Mining Open Semantics from CLIP: A Relation Transition Perspective for Few-Shot Learning
Cilin Yan, Haochen Wang, Xiaolong Jiang, Yao Hu, Xu Tang, Guoliang, Kang, Efstratios Gavves

TL;DR
This paper introduces a novel method to enhance few-shot learning by mining open semantics from CLIP's rich image-text relationships, using a transformer to model relation transitions for improved classification accuracy.
Contribution
The paper proposes a relation transition approach leveraging open semantics in CLIP, with a learnable [CLASS] token, to improve few-shot classification performance.
Findings
Outperforms previous state-of-the-art methods on eleven datasets
Effectively models open semantics for better few-shot adaptation
Utilizes a transformer with learnable tokens for relation transition
Abstract
Contrastive Vision-Language Pre-training(CLIP) demonstrates impressive zero-shot capability. The key to improve the adaptation of CLIP to downstream task with few exemplars lies in how to effectively model and transfer the useful knowledge embedded in CLIP. Previous work mines the knowledge typically based on the limited visual samples and close-set semantics (i.e., within target category set of downstream task). However, the aligned CLIP image/text encoders contain abundant relationships between visual features and almost infinite open semantics, which may benefit the few-shot learning but remains unexplored. In this paper, we propose to mine open semantics as anchors to perform a relation transition from image-anchor relationship to image-target relationship to make predictions. Specifically, we adopt a transformer module which takes the visual feature as "Query", the text features of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInterpreting and Communication in Healthcare · Natural Language Processing Techniques
MethodsSparse Evolutionary Training · Contrastive Language-Image Pre-training
