WavSpA: Wavelet Space Attention for Boosting Transformers' Long Sequence   Learning Ability

Yufan Zhuang; Zihan Wang; Fangbo Tao; Jingbo Shang

arXiv:2210.01989·cs.CL·May 24, 2023·1 cites

WavSpA: Wavelet Space Attention for Boosting Transformers' Long Sequence Learning Ability

Yufan Zhuang, Zihan Wang, Fangbo Tao, Jingbo Shang

PDF

Open Access 1 Repo

TL;DR

WavSpA introduces a wavelet-based attention mechanism for Transformers, capturing both position and frequency information efficiently, leading to improved long sequence learning and reasoning extrapolation.

Contribution

The paper proposes Wavelet Space Attention (WavSpA), a novel method that replaces Fourier-based attention with wavelet transforms for better long-range sequence modeling in Transformers.

Findings

01

WavSpA outperforms Fourier-based methods on Long Range Arena tasks.

02

Learning in wavelet space enhances Transformer's reasoning over long distances.

03

Wavelet transforms improve efficiency by capturing multi-resolution features.

Abstract

Transformer and its variants are fundamental neural architectures in deep learning. Recent works show that learning attention in the Fourier space can improve the long sequence learning capability of Transformers. We argue that wavelet transform shall be a better choice because it captures both position and frequency information with linear time complexity. Therefore, in this paper, we systematically study the synergy between wavelet transform and Transformers. We propose Wavelet Space Attention (WavSpA) that facilitates attention learning in a learnable wavelet coefficient space which replaces the attention in Transformers by (1) applying forward wavelet transform to project the input sequences to multi-resolution bases, (2) conducting attention learning in the wavelet coefficient space, and (3) reconstructing the representation in input space via backward wavelet transform. Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

EvanZhuang/wavspa
jax

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Neural Networks and Reservoir Computing · Domain Adaptation and Few-Shot Learning

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Residual Connection · Label Smoothing · Softmax · Byte Pair Encoding · Adam · Dense Connections · Absolute Position Encodings