Neutral-Reference Prompting for Vision-Language Models
Senmao Tian, Xiang Wei, Shunli Zhang

TL;DR
This paper introduces NeRP, a prompting correction method that enhances vision-language models' ability to recognize unseen classes without sacrificing accuracy on known classes, by leveraging neutral prompts and reference images.
Contribution
NeRP is a novel, plug-and-play prompting correction strategy that improves unseen class recognition in VLMs without changing model parameters.
Findings
NeRP significantly boosts accuracy on unseen classes across multiple benchmarks.
NeRP maintains high performance on known classes while improving generalization.
Extensive experiments validate NeRP's effectiveness across various backbones and data settings.
Abstract
Efficient transfer learning of vision-language models (VLMs) commonly suffers from a Base-New Trade-off (BNT): improving performance on unseen (new) classes often degrades accuracy on known (base) classes. Addressing how to boost recognition of unseen classes without sacrificing known-class performance remains a central challenge. Existing work often simplistically attributes the BNT to overfitting on known classes. We observe an interesting phenomenon: VLMs frequently exhibit asymmetric confusion on certain downstream data, i.e., samples of class A are systematically mispredicted as class B, while the reverse confusion (B to A) rarely occurs. For known classes, this kind of bias can be mitigated by tuning using a cross-entropy loss, but for unseen classes, such pretraining-induced bias persists and harms generalization. Motivated by this, we propose NeRP, a plug-and-play prompting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
