CAPT: Confusion-Aware Prompt Tuning for Reducing Vision-Language Misalignment
Maoyuan Shao, Yutong Gao, Xinyang Huang, Chuang Zhu, Lijuan Sun, Guoshun Nan

TL;DR
CAPT introduces a confusion-aware prompt tuning framework for vision-language models, explicitly modeling and reducing class confusion to improve discriminability and generalization across multiple datasets.
Contribution
The paper proposes a novel framework with a confusion bank, semantic and sample confusion miners, and a multi-granularity expert to address systematic misclassifications in vision-language models.
Findings
Significantly reduces confusion-induced errors.
Improves discriminability and generalization on 11 datasets.
Resolves over 50% of confusable sample pairs.
Abstract
Vision-language models like CLIP have achieved remarkable progress in cross-modal representation learning, yet suffer from systematic misclassifications among visually and semantically similar categories. We observe that such confusion patterns are not random but persistently occur between specific category pairs, revealing the model's intrinsic bias and limited fine-grained discriminative ability. To address this, we propose CAPT, a Confusion-Aware Prompt Tuning framework that enables models to learn from their own misalignment. Specifically, we construct a Confusion Bank to explicitly model stable confusion relationships across categories and misclassified samples. On this basis, we introduce a Semantic Confusion Miner (SEM) to capture global inter-class confusion through semantic difference and commonality prompts, and a Sample Confusion Miner (SAM) to retrieve representative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling
