Agent Attention: On the Integration of Softmax and Linear Attention

Dongchen Han; Tianzhu Ye; Yizeng Han; Zhuofan Xia; Siyuan Pan; Pengfei; Wan; Shiji Song; Gao Huang

arXiv:2312.08874·cs.CV·July 16, 2024·21 cites

Agent Attention: On the Integration of Softmax and Linear Attention

Dongchen Han, Tianzhu Ye, Yizeng Han, Zhuofan Xia, Siyuan Pan, Pengfei, Wan, Shiji Song, Gao Huang

PDF

Open Access 2 Repos

TL;DR

This paper introduces Agent Attention, a novel attention mechanism that balances efficiency and expressiveness by integrating softmax and linear attention, demonstrating superior performance across vision tasks and high-resolution scenarios.

Contribution

The paper proposes Agent Attention, a new attention paradigm that combines the strengths of softmax and linear attention, improving efficiency while maintaining global context modeling.

Findings

01

Agent Attention outperforms traditional softmax attention in efficiency.

02

It achieves comparable or better accuracy in vision tasks.

03

Significantly accelerates image generation in high-resolution scenarios.

Abstract

The attention module is the key component in Transformers. While the global attention mechanism offers high expressiveness, its excessive computational cost restricts its applicability in various scenarios. In this paper, we propose a novel attention paradigm, Agent Attention, to strike a favorable balance between computational efficiency and representation power. Specifically, the Agent Attention, denoted as a quadruple $(Q, A, K, V)$ , introduces an additional set of agent tokens $A$ into the conventional attention module. The agent tokens first act as the agent for the query tokens $Q$ to aggregate information from $K$ and $V$ , and then broadcast the information back to $Q$ . Given the number of agent tokens can be designed to be much smaller than the number of query tokens, the agent attention is significantly more efficient than the widely adopted Softmax attention, while preserving…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Domain Adaptation and Few-Shot Learning

MethodsSparse Evolutionary Training · Diffusion · Softmax