Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator
Ziwei He, Meng Yang, Minwei Feng, Jingcheng Yin, Xinbing Wang, Jingwen Leng, Zhouhan Lin

TL;DR
Fourier Transformer introduces a novel approach to reduce the computational complexity of long-range sequence modeling by leveraging FFT-based transformations, enabling faster and more memory-efficient transformers that inherit pretrained weights.
Contribution
It proposes a simple method using FFT to remove redundancies in sequences, significantly improving efficiency while maintaining compatibility with pretrained models.
Findings
Achieves state-of-the-art results on long-range benchmarks
Reduces computational costs and memory usage
Outperforms standard BART on seq-to-seq tasks
Abstract
The transformer model is known to be computationally demanding, and prohibitively costly for long sequences, as the self-attention module uses a quadratic time and space complexity with respect to sequence length. Many researchers have focused on designing new forms of self-attention or introducing new parameters to overcome this limitation, however a large portion of them prohibits the model to inherit weights from large pretrained models. In this work, the transformer's inefficiency has been taken care of from another perspective. We propose Fourier Transformer, a simple yet effective approach by progressively removing redundancies in hidden sequence using the ready-made Fast Fourier Transform (FFT) operator to perform Discrete Cosine Transformation (DCT). Fourier Transformer is able to significantly reduce computational costs while retain the ability to inherit from various large…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Ferroelectric and Negative Capacitance Devices
MethodsMulti-Head Attention · Absolute Position Encodings · Refunds@Expedia|||How do I get a full refund from Expedia? · Softmax · Layer Normalization · Byte Pair Encoding · Dropout · Attention Is All You Need · Linear Layer · Label Smoothing
