DeAR: Fine-Grained VLM Adaptation by Decomposing Attention Head Roles
Yiming Ma, Hongkun Yang, Lionel Z. Wang, Bin Chen, Weizhi Xian, Jianzhi Teng

TL;DR
DeAR introduces a fine-grained approach to adapt vision-language models by decomposing attention head roles, improving task-specific adaptation while preserving zero-shot generalization.
Contribution
The paper proposes a novel method that classifies attention heads into functional roles and controls their interactions, enhancing VLM adaptation without sacrificing generalization.
Findings
Outperforms previous methods on fifteen datasets
Balances task adaptation and zero-shot generalization effectively
Classifies attention heads into Attribute, Generalization, and Mixed roles
Abstract
Prompt learning is a dominant paradigm for adapting pre-trained Vision-Language Models (VLMs) to downstream tasks. However, existing methods often rely on a simplistic, layer-centric view, assuming shallow layers capture general features while deep layers handle task-specific knowledge. This assumption results in uncontrolled interactions between learnable tokens and original tokens. Task-specific knowledge could degrades the model's core generalization and creates a trade-off between task adaptation and the preservation of zero-shot generalization. To address this, we challenge the layer-centric view and propose \textbf{DeAR}, a framework that achieves fine-grained VLM adaptation by \textbf{De}composing \textbf{A}ttention head \textbf{R}oles. We posit that the functional specialization within VLMs occurs not between layers, but at the finer-grained level of individual attention heads…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
