Adaptive Prompt Tuning: Vision Guided Prompt Tuning with Cross-Attention for Fine-Grained Few-Shot Learning
Eric Brouwer, Jan Erik van Woerden, Gertjan Burghouts, Matias, Valdenegro-Toro, Marco Zullich

TL;DR
This paper introduces an adaptive prompt tuning method guided by visual inputs using cross-attention, significantly improving few-shot fine-grained classification performance by dynamically aligning textual and visual features.
Contribution
It proposes a novel adaptive prompt tuning approach with cross-attention for dynamic text prompt refinement, surpassing static prompt methods in fine-grained visual tasks.
Findings
Significant accuracy improvements on CUBirds, Oxford Flowers, FGVC Aircraft datasets.
Enhanced model reliability through Monte-Carlo Dropout for uncertainty estimation.
Outperforms existing static prompt tuning techniques in few-shot scenarios.
Abstract
Few-shot, fine-grained classification in computer vision poses significant challenges due to the need to differentiate subtle class distinctions with limited data. This paper presents a novel method that enhances the Contrastive Language-Image Pre-Training (CLIP) model through adaptive prompt tuning, guided by real-time visual inputs. Unlike existing techniques such as Context Optimization (CoOp) and Visual Prompt Tuning (VPT), which are constrained by static prompts or visual token reliance, the proposed approach leverages a cross-attention mechanism to dynamically refine text prompts for the image at hand. This enables an image-specific alignment of textual features with image patches extracted from the Vision Transformer, making the model more effective for datasets with high intra-class variance and low inter-class differences. The method is evaluated on several datasets, including…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Machine Learning and ELM
MethodsAttention Is All You Need · Linear Layer · Vision Transformer · Dropout · Multi-Head Attention · Adam · Layer Normalization · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection
