Robust Prompt Tuning for Vision-Language Models with Mild Semantic Noise
Yansheng Gao, Yufei Zheng, Shengsheng Wang

TL;DR
This paper introduces ANPrompt, a novel prompt tuning framework for vision-language models that actively incorporates weak semantic noise and stabilizes visual semantics, significantly improving robustness and generalization to unseen categories.
Contribution
The paper proposes ANPrompt, which integrates weak semantic noise and a noise-resistant visual prompt, along with a new loss function, to enhance robustness and generalization in prompt tuning.
Findings
Outperforms existing methods on 11 benchmarks.
Enhances robustness to semantic noise.
Improves generalization to unseen categories.
Abstract
Prompt tuning has shown promising results, but its robustness and generalization to unseen categories remain limited. Through our experiments, we demonstrate that the complete removal of semantic noise is a key factor restricting robustness. Existing methods typically suppress or filter out semantic noise in the prompt space, inadvertently hindering the model's robustness and its ability to generalize to unseen categories. To address this, we propose ANPrompt, a robust prompt tuning framework that actively incorporates weak semantic noise. By clustering weakly perturbed features into noise prompts and integrating them with learnable tokens in both the text and vision encoders, ANPrompt ensures controlled exposure to semantic variations. To enhance the visual pathway, we introduce the Noise-Resistant Visual Prompt Prototype (NRVPP), which stabilizes visual semantics under weak…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
