Flowformer: Linearizing Transformers with Conservation Flows

Haixu Wu; Jialong Wu; Jiehui Xu; Jianmin Wang; Mingsheng Long

arXiv:2202.06258·cs.LG·June 17, 2022·31 cites

Flowformer: Linearizing Transformers with Conservation Flows

Haixu Wu, Jialong Wu, Jiehui Xu, Jianmin Wang, Mingsheng Long

PDF

Open Access 1 Repo

TL;DR

Flowformer introduces a novel linear attention mechanism based on flow network theory, enabling efficient processing of long sequences across various domains without relying on specific inductive biases.

Contribution

This paper presents Flow-Attention, a flow conservation-based linear attention mechanism that enhances transformer scalability and generality.

Findings

01

Achieves linear complexity in attention computation.

02

Performs well across long sequences, vision, NLP, and reinforcement learning.

03

Does not depend on inductive biases like locality.

Abstract

Transformers based on the attention mechanism have achieved impressive success in various areas. However, the attention mechanism has a quadratic complexity, significantly impeding Transformers from dealing with numerous tokens and scaling up to bigger models. Previous methods mainly utilize the similarity decomposition and the associativity of matrix multiplication to devise linear-time attention mechanisms. They avoid degeneration of attention to a trivial distribution by reintroducing inductive biases such as the locality, thereby at the expense of model generality and expressiveness. In this paper, we linearize Transformers free from specific inductive biases based on the flow network theory. We cast attention as the information flow aggregated from the sources (values) to the sinks (results) through the learned flow capacities (attentions). Within this framework, we apply the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thuml/Flowformer
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Reservoir Computing · Neural Networks and Applications · Music and Audio Processing