Dynamic Rank Adaptation for Vision-Language Models
Jiahui Wang, Qin Xu, Bo Jiang, Bin Luo

TL;DR
This paper introduces Dynamic Rank Adaptation (DRA), a novel method for fine-tuning vision-language models that dynamically allocates adaptation resources to improve recognition of new, unseen classes while maintaining generalization.
Contribution
DRA adaptively assigns feature ranks based on token importance, enhancing generalization to new classes in vision-language models, a novel approach compared to existing static methods.
Findings
DRA outperforms existing methods on benchmarks for new class recognition.
DRA improves cross-dataset and domain generalization performance.
DRA effectively balances adaptation and preservation of general knowledge.
Abstract
Pre-trained large vision-language models (VLMs) like CLIP demonstrate impressive generalization ability. Existing prompt-based and adapter-based works have made significant progress in fine-tuning VLMs but still face the challenges of maintaining strong generalization abilities, particularly towards unseen new classes. This limitation partly arises from these methods treating all tokens of the image and text encoder equally, which can lead to overfitting on less informative features (e.g., background noise, template words) and degrade the general representations that are crucial for novel concept recognition. To address this issue, we propose Dynamic Rank Adaptation (DRA), a novel adapter variant method, designed specifically to enhance new class generalization. DRA dynamically allocates adaptation ranks based on the importance of features during training to preserve general knowledge.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis
