Scaling Attention to Very Long Sequences in Linear Time with Wavelet-Enhanced Random Spectral Attention (WERSA)

Vincenzo Dentamaro

arXiv:2507.08637·cs.LG·July 14, 2025

Scaling Attention to Very Long Sequences in Linear Time with Wavelet-Enhanced Random Spectral Attention (WERSA)

Vincenzo Dentamaro

PDF

1 Repo 1 Models

TL;DR

WERSA introduces a linear-time attention mechanism using wavelet-enhanced spectral features, enabling efficient processing of very long sequences across vision and NLP tasks with improved accuracy and reduced computational costs.

Contribution

The paper presents WERSA, a novel linear-time attention method combining spectral features and wavelets, outperforming existing mechanisms on multiple benchmarks.

Findings

01

WERSA achieves 1.2% higher accuracy on ArXiv classification.

02

WERSA reduces training time by 81% and FLOPS by 73.4%.

03

WERSA handles extremely long sequences where quadratic methods fail.

Abstract

Transformer models are computationally costly on long sequences since regular attention has quadratic $O (n^{2})$ time complexity. We introduce Wavelet-Enhanced Random Spectral Attention (WERSA), a novel mechanism of linear $O (n)$ time complexity that is pivotal to enable successful long-sequence processing without the performance trade-off. WERSA merges content-adaptive random spectral features together with multi-resolution Haar wavelets and learnable parameters to selectively attend to informative scales of data while preserving linear efficiency. Large-scale comparisons \textbf{on single GPU} and across various benchmarks (vision, NLP, hierarchical reasoning) and various attention mechanisms (like Multiheaded Attention, Flash-Attention-2, FNet, Linformer, Performer, Waveformer), reveal uniform advantages of WERSA. It achieves best accuracy in all tests. On ArXiv classification, WERSA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://huggingface.co/vincenzodentamaro/wersa
noneOfficial

Models

🤗
vincenzodentamaro/wersa
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsWavelet-Enhanced Random Spectral Attention