Flowformer: Linearizing Transformers with Conservation Flows
Haixu Wu, Jialong Wu, Jiehui Xu, Jianmin Wang, Mingsheng Long

TL;DR
Flowformer introduces a novel linear attention mechanism based on flow network theory, enabling efficient processing of long sequences across various domains without relying on specific inductive biases.
Contribution
This paper presents Flow-Attention, a flow conservation-based linear attention mechanism that enhances transformer scalability and generality.
Findings
Achieves linear complexity in attention computation.
Performs well across long sequences, vision, NLP, and reinforcement learning.
Does not depend on inductive biases like locality.
Abstract
Transformers based on the attention mechanism have achieved impressive success in various areas. However, the attention mechanism has a quadratic complexity, significantly impeding Transformers from dealing with numerous tokens and scaling up to bigger models. Previous methods mainly utilize the similarity decomposition and the associativity of matrix multiplication to devise linear-time attention mechanisms. They avoid degeneration of attention to a trivial distribution by reintroducing inductive biases such as the locality, thereby at the expense of model generality and expressiveness. In this paper, we linearize Transformers free from specific inductive biases based on the flow network theory. We cast attention as the information flow aggregated from the sources (values) to the sinks (results) through the learned flow capacities (attentions). Within this framework, we apply the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Reservoir Computing · Neural Networks and Applications · Music and Audio Processing
