Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator

Ziwei He; Meng Yang; Minwei Feng; Jingcheng Yin; Xinbing Wang; Jingwen Leng; Zhouhan Lin

arXiv:2305.15099·cs.CL·May 19, 2025·1 cites

Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator

Ziwei He, Meng Yang, Minwei Feng, Jingcheng Yin, Xinbing Wang, Jingwen Leng, Zhouhan Lin

PDF

Open Access 1 Repo

TL;DR

Fourier Transformer introduces a novel approach to reduce the computational complexity of long-range sequence modeling by leveraging FFT-based transformations, enabling faster and more memory-efficient transformers that inherit pretrained weights.

Contribution

It proposes a simple method using FFT to remove redundancies in sequences, significantly improving efficiency while maintaining compatibility with pretrained models.

Findings

01

Achieves state-of-the-art results on long-range benchmarks

02

Reduces computational costs and memory usage

03

Outperforms standard BART on seq-to-seq tasks

Abstract

The transformer model is known to be computationally demanding, and prohibitively costly for long sequences, as the self-attention module uses a quadratic time and space complexity with respect to sequence length. Many researchers have focused on designing new forms of self-attention or introducing new parameters to overcome this limitation, however a large portion of them prohibits the model to inherit weights from large pretrained models. In this work, the transformer's inefficiency has been taken care of from another perspective. We propose Fourier Transformer, a simple yet effective approach by progressively removing redundancies in hidden sequence using the ready-made Fast Fourier Transform (FFT) operator to perform Discrete Cosine Transformation (DCT). Fourier Transformer is able to significantly reduce computational costs while retain the ability to inherit from various large…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lumia-group/fouriertransformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Ferroelectric and Negative Capacitance Devices

MethodsMulti-Head Attention · Absolute Position Encodings · Refunds@Expedia|||How do I get a full refund from Expedia? · Softmax · Layer Normalization · Byte Pair Encoding · Dropout · Attention Is All You Need · Linear Layer · Label Smoothing