Learning to Adapt Category Consistent Meta-Feature of CLIP for Few-Shot Classification
Jiaying Shi, Xuetong Xue, Shenghui Xu

TL;DR
This paper introduces MF-Adapter, a novel method that leverages local low-level features and high-level semantic features in CLIP to improve few-shot image classification, outperforming existing methods.
Contribution
The paper proposes a new Meta-Feature Adaption method combining local and high-level features, with a Meta-Feature Unit for better intra-class generalization in few-shot learning.
Findings
MF-Adapter outperforms state-of-the-art CLIP-based few-shot methods.
Local features are more consistent across unseen samples than high-level features.
The method shows strong results on challenging visual classification tasks.
Abstract
The recent CLIP-based methods have shown promising zero-shot and few-shot performance on image classification tasks. Existing approaches such as CoOp and Tip-Adapter only focus on high-level visual features that are fully aligned with textual features representing the ``Summary" of the image. However, the goal of few-shot learning is to classify unseen images of the same category with few labeled samples. Especially, in contrast to high-level representations, local representations (LRs) at low-level are more consistent between seen and unseen samples. Based on this point, we propose the Meta-Feature Adaption method (MF-Adapter) that combines the complementary strengths of both LRs and high-level semantic representations. Specifically, we introduce the Meta-Feature Unit (MF-Unit), which is a simple yet effective local similarity metric to measure category-consistent local context in an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning
MethodsSparse Evolutionary Training · Contrastive Language-Image Pre-training · Context Optimization · Focus
