Short-Long Convolutions Help Hardware-Efficient Linear Attention to   Focus on Long Sequences

Zicheng Liu; Siyuan Li; Li Wang; Zedong Wang; Yunfan Liu; and Stan Z.; Li

arXiv:2406.08128·cs.LG·June 17, 2024

Short-Long Convolutions Help Hardware-Efficient Linear Attention to Focus on Long Sequences

Zicheng Liu, Siyuan Li, Li Wang, Zedong Wang, Yunfan Liu, and Stan Z., Li

PDF

Open Access

TL;DR

This paper introduces CHELA, a hybrid approach combining short-long convolutions with hardware-efficient linear attention, enabling effective long sequence processing with real linear complexity and improved stability.

Contribution

It proposes a novel hybrid model that replaces SSMs with convolutions and implements linear attention efficiently for long sequences, addressing hardware and stability issues.

Findings

01

Outperforms existing methods on Long Range Arena benchmark

02

Achieves real linear complexity in long sequence processing

03

Demonstrates effectiveness on language modeling tasks

Abstract

To mitigate the computational complexity in the self-attention mechanism on long sequences, linear attention utilizes computation tricks to achieve linear complexity, while state space models (SSMs) popularize a favorable practice of using non-data-dependent memory pattern, i.e., emphasize the near and neglect the distant, to processing sequences. Recent studies have shown the priorities by combining them as one. However, the efficiency of linear attention remains only at the theoretical level in a causal setting, and SSMs require various designed constraints to operate effectively on specific data. Therefore, in order to unveil the true power of the hybrid design, the following two issues need to be addressed: (1) hardware-efficient implementation for linear attention and (2) stabilization of SSMs. To achieve this, we leverage the thought of tiling and hierarchy to propose CHELA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational Physics and Python Applications · Algorithms and Data Compression · Advanced Data Compression Techniques