Explicit Uncertainty Modeling for Active CLIP Adaptation with Dual Prompt Tuning
Qian-Wei Wang, Yaguang Song, Shu-Tao Xia

TL;DR
This paper introduces a novel uncertainty modeling approach for active CLIP adaptation using dual prompt tuning, which improves sample selection and classification performance in limited annotation scenarios.
Contribution
It proposes a dual-prompt tuning framework that explicitly models uncertainty from the model perspective, enhancing active learning for CLIP-based image classification.
Findings
Outperforms existing active learning methods under the same annotation budget
Improves classification reliability through positive prompt tuning
Provides a principled uncertainty signal for sample selection
Abstract
Pre-trained vision-language models such as CLIP exhibit strong transferability, yet adapting them to downstream image classification tasks under limited annotation budgets remains challenging. In active learning settings, the model must select the most informative samples for annotation from a large pool of unlabeled data. Existing approaches typically estimate uncertainty via entropy-based criteria or representation clustering, without explicitly modeling uncertainty from the model perspective. In this work, we propose a robust uncertainty modeling framework for active CLIP adaptation based on dual-prompt tuning. We introduce two learnable prompts in the textual branch of CLIP. The positive prompt enhances the discriminability of task-specific textual embeddings corresponding to light-weight tuned visual embeddings, improving classification reliability. Meanwhile, the negative prompt…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
