Beyond Class Tokens: LLM-guided Dominant Property Mining for Few-shot Classification
Wei Zhuo, Runjie Luo, Wufeng Xue, Linlin Shen

TL;DR
This paper introduces BCT-CLIP, a novel few-shot learning method that leverages large language models to identify and utilize dominating visual properties for improved class discrimination beyond simple class token alignment.
Contribution
The paper proposes a new approach that incorporates dominating property mining using LLMs and contrastive learning, enhancing few-shot classification performance over existing methods.
Findings
Outperforms existing methods on 11 datasets
Effectively captures class-specific visual properties
Improves discriminative visual representation learning
Abstract
Few-shot Learning (FSL), which endeavors to develop the generalization ability for recognizing novel classes using only a few images, faces significant challenges due to data scarcity. Recent CLIP-like methods based on contrastive language-image pertaining mitigate the issue by leveraging textual representation of the class name for unseen image discovery. Despite the achieved success, simply aligning visual representations to class name embeddings would compromise the visual diversity for novel class discrimination. To this end, we proposed a novel Few-Shot Learning (FSL) method (BCT-CLIP) that explores \textbf{dominating properties} via contrastive learning beyond simply using class tokens. Through leveraging LLM-based prior knowledge, our method pushes forward FSL with comprehensive structural image representations, including both global category representation and the patch-aware…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
