Multi Resolution Analysis (MRA) for Approximate Self-Attention

Zhanpeng Zeng; Sourav Pal; Jeffery Kline; Glenn M Fung; Vikas Singh

arXiv:2207.10284·cs.LG·July 22, 2022·1 cites

Multi Resolution Analysis (MRA) for Approximate Self-Attention

Zhanpeng Zeng, Sourav Pal, Jeffery Kline, Glenn M Fung, Vikas Singh

PDF

Open Access 2 Repos 2 Models

TL;DR

This paper introduces a multi-resolution analysis approach using wavelets for efficient self-attention in Transformers, demonstrating superior performance across various sequence lengths and outperforming existing methods.

Contribution

It revisits classical MRA concepts like wavelets and adapts them for self-attention, offering a novel, effective approximation method for Transformers.

Findings

01

Outperforms most efficient self-attention methods

02

Effective for both short and long sequences

03

Demonstrates excellent performance across criteria

Abstract

Transformers have emerged as a preferred model for many tasks in natural langugage processing and vision. Recent efforts on training and deploying Transformers more efficiently have identified many strategies to approximate the self-attention matrix, a key module in a Transformer architecture. Effective ideas include various prespecified sparsity patterns, low-rank basis expansions and combinations thereof. In this paper, we revisit classical Multiresolution Analysis (MRA) concepts such as Wavelets, whose potential value in this setting remains underexplored thus far. We show that simple approximations based on empirical feedback and design choices informed by modern hardware and implementation challenges, eventually yield a MRA-based approach for self-attention with an excellent performance profile across most criteria of interest. We undertake an extensive set of experiments and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBlind Source Separation Techniques · Sparse and Compressive Sensing Techniques · Image and Signal Denoising Methods

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Dropout · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Layer Normalization · Adam · Residual Connection