A-VL: Adaptive Attention for Large Vision-Language Models

Junyang Zhang; Mu Yuan; Ruiguang Zhong; Puhan Luo; Huiyou Zhan,; Ningkang Zhang; Chengchen Hu; Xiangyang Li

arXiv:2409.14846·cs.AI·February 10, 2025

A-VL: Adaptive Attention for Large Vision-Language Models

Junyang Zhang, Mu Yuan, Ruiguang Zhong, Puhan Luo, Huiyou Zhan,, Ningkang Zhang, Chengchen Hu, Xiangyang Li

PDF

Open Access 1 Repo 1 Video

TL;DR

A-VL introduces a novel adaptive attention mechanism for large vision-language models that dynamically manages attention across modalities, significantly reducing resource consumption while maintaining high performance.

Contribution

The paper presents A-VL, a tailored adaptive attention method for LVLMs that manages visual and language attention separately, improving efficiency during inference.

Findings

01

Reduces memory and computation in LVLMs

02

Maintains performance across multiple tasks and datasets

03

Outperforms existing adaptive attention methods

Abstract

The Large Vision-Language Model (LVLM) integrates computer vision and natural language processing techniques, offering substantial application potential. However, these models demand extensive resources during inference. Adaptive attention techniques can dynamically reduce computational redundancy and thus improve efficiency. Although current adaptive attention methods significantly reduce the memory requirements of Transformer-based language models, they are not tailored for LVLMs. We observe that LVLMs generate responses from both remote image tokens and local text tokens, and different modalities have different attention patterns. This observation inspires us to manage the attention for each modality separately. Specifically, for visual input, we store the cache of potentially useful information but only compute the most critical parts. For language input, we care more about local…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jyzzz98/A-VL-Adaptive-Attention-for-Large-Vision-Language-Models
jaxOfficial

Videos

A-VL: Adaptive Attention for Large Vision-Language Models· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · COVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning

MethodsSoftmax · Attention Is All You Need