DeAR: Fine-Grained VLM Adaptation by Decomposing Attention Head Roles

Yiming Ma; Hongkun Yang; Lionel Z. Wang; Bin Chen; Weizhi Xian; Jianzhi Teng

arXiv:2603.01111·cs.CV·March 10, 2026

DeAR: Fine-Grained VLM Adaptation by Decomposing Attention Head Roles

Yiming Ma, Hongkun Yang, Lionel Z. Wang, Bin Chen, Weizhi Xian, Jianzhi Teng

PDF

Open Access

TL;DR

DeAR introduces a fine-grained approach to adapt vision-language models by decomposing attention head roles, improving task-specific adaptation while preserving zero-shot generalization.

Contribution

The paper proposes a novel method that classifies attention heads into functional roles and controls their interactions, enhancing VLM adaptation without sacrificing generalization.

Findings

01

Outperforms previous methods on fifteen datasets

02

Balances task adaptation and zero-shot generalization effectively

03

Classifies attention heads into Attribute, Generalization, and Mixed roles

Abstract

Prompt learning is a dominant paradigm for adapting pre-trained Vision-Language Models (VLMs) to downstream tasks. However, existing methods often rely on a simplistic, layer-centric view, assuming shallow layers capture general features while deep layers handle task-specific knowledge. This assumption results in uncontrolled interactions between learnable tokens and original tokens. Task-specific knowledge could degrades the model's core generalization and creates a trade-off between task adaptation and the preservation of zero-shot generalization. To address this, we challenge the layer-centric view and propose \textbf{DeAR}, a framework that achieves fine-grained VLM adaptation by \textbf{De}composing \textbf{A}ttention head \textbf{R}oles. We posit that the functional specialization within VLMs occurs not between layers, but at the finer-grained level of individual attention heads…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications