SPECTRE: An FFT-Based Efficient Drop-In Replacement to Self-Attention for Long Contexts

Jacob Fein-Ashley; Neelesh Gupta; Rajgopal Kannan; Viktor Prasanna

arXiv:2502.18394·cs.LG·May 20, 2025

SPECTRE: An FFT-Based Efficient Drop-In Replacement to Self-Attention for Long Contexts

Jacob Fein-Ashley, Neelesh Gupta, Rajgopal Kannan, Viktor Prasanna

PDF

Open Access 2 Repos

TL;DR

SPECTRE introduces an FFT-based self-attention replacement that significantly improves efficiency for long-context transformers, enabling faster processing of tens of thousands of tokens with minimal parameter overhead.

Contribution

It proposes a novel FFT-based attention mechanism that reduces complexity from quadratic to logarithmic, facilitating long-context processing without specialized hardware.

Findings

01

Operates up to 7× faster than FlashAttention-2 on 128k-token contexts

02

Matches or exceeds baseline performance on language and vision tasks

03

Adds fewer than 6% parameters to the base model

Abstract

Long-context transformers face significant efficiency challenges due to the quadratic cost of self-attention. However, many modern applications-from multi-turn dialogue to high-resolution vision-require contexts spanning tens of thousands of tokens. We introduce SPECTRE, a method that replaces each attention head with a fast real FFT, a content-adaptive spectral gate, and an inverse FFT, reducing per-layer complexity from $O (L^{2})$ to $O (L lo g L)$ while preserving the surrounding architecture. We extend this efficiency to autoregressive generation through our Prefix-FFT cache and enhance local feature representation with an optional wavelet module that adds negligible computational overhead. Our experiments demonstrate that SPECTRE operates up to 7 $\times$ faster than FlashAttention-2 on 128k-token contexts while matching or exceeding baseline performance on PG-19 language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Natural Language Processing Techniques

MethodsSoftmax · Attention Is All You Need · Balanced Selection · modReLU