S$^3$Attention: Improving Long Sequence Attention with Smoothed Skeleton   Sketching

Xue Wang; Tian Zhou; Jianqing Zhu; Jialin Liu; Kun Yuan; Tao Yao,; Wotao Yin; Rong Jin; HanQin Cai

arXiv:2408.08567·cs.LG·September 18, 2024

S$^3$Attention: Improving Long Sequence Attention with Smoothed Skeleton Sketching

Xue Wang, Tian Zhou, Jianqing Zhu, Jialin Liu, Kun Yuan, Tao Yao,, Wotao Yin, Rong Jin, HanQin Cai

PDF

1 Repo

TL;DR

S$^3$Attention introduces a novel smoothing and sketching approach to reduce the complexity of attention mechanisms, enabling efficient long sequence processing with improved accuracy over existing methods.

Contribution

The paper proposes S$^3$Attention, a new attention structure that balances information preservation and computational efficiency using smoothing and matrix sketching techniques.

Findings

01

Outperforms vanilla Attention on Long Range Arena datasets

02

Achieves superior results in six time-series forecasting tasks

03

Maintains linear complexity with effective noise reduction

Abstract

Attention based models have achieved many remarkable breakthroughs in numerous applications. However, the quadratic complexity of Attention makes the vanilla Attention based models hard to apply to long sequence tasks. Various improved Attention structures are proposed to reduce the computation cost by inducing low rankness and approximating the whole sequence by sub-sequences. The most challenging part of those approaches is maintaining the proper balance between information preservation and computation reduction: the longer sub-sequences used, the better information is preserved, but at the price of introducing more noise and computational costs. In this paper, we propose a smoothed skeleton sketching based Attention structure, coined S $^{3}$ Attention, which significantly improves upon the previous attempts to negotiate this trade-off. S $^{3}$ Attention has two mechanisms to effectively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wxie9/s3attention
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax · Attention Is All You Need