TL;DR
S$^3$Attention introduces a novel smoothing and sketching approach to reduce the complexity of attention mechanisms, enabling efficient long sequence processing with improved accuracy over existing methods.
Contribution
The paper proposes S$^3$Attention, a new attention structure that balances information preservation and computational efficiency using smoothing and matrix sketching techniques.
Findings
Outperforms vanilla Attention on Long Range Arena datasets
Achieves superior results in six time-series forecasting tasks
Maintains linear complexity with effective noise reduction
Abstract
Attention based models have achieved many remarkable breakthroughs in numerous applications. However, the quadratic complexity of Attention makes the vanilla Attention based models hard to apply to long sequence tasks. Various improved Attention structures are proposed to reduce the computation cost by inducing low rankness and approximating the whole sequence by sub-sequences. The most challenging part of those approaches is maintaining the proper balance between information preservation and computation reduction: the longer sub-sequences used, the better information is preserved, but at the price of introducing more noise and computational costs. In this paper, we propose a smoothed skeleton sketching based Attention structure, coined SAttention, which significantly improves upon the previous attempts to negotiate this trade-off. SAttention has two mechanisms to effectively…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax · Attention Is All You Need
