Beyond Self Attention: A Subquadratic Fourier Wavelet Transformer with Multi Modal Fusion
Andrew Kiruluta, Andreas Lemos, Eric Lundy

TL;DR
This paper introduces a novel Fourier Wavelet Transformer with multi-modal fusion that replaces traditional attention, achieving subquadratic complexity and improved expressiveness for tasks like abstractive summarization.
Contribution
It presents a new spectral-based transformer model with Fourier Wavelet attention, integrating frequency and time transforms, and extends it to multi-modal data with enhanced efficiency.
Findings
Achieves subquadratic time and memory complexity.
Improves model expressiveness over traditional transformers.
Effective on PubMed abstractive summarization task.
Abstract
We revisit the use of spectral techniques to replaces the attention mechanism in Transformers through Fourier Transform based token mixing, and present a comprehensive and novel reformulation of this technique in next generation transformer models. We provide expanded literature context, detailed mathematical formulations of Fourier mixing and causal masking, and introduce a novel MultiDomain Fourier Wavelet Attention(MDFWA) that integrates frequency and time localized transforms to capture both global and local dependencies efficiently. We derive the complexity bounds, gradient formulas, and show that MDFWA achieves sub quadratic time and memory cost while improving expressive power. We validate our design on an abstractive summarization task using PubMed dataset, by enhancing the proposed approach with learned frequency bases, adaptive scale selection, and multi-modal extensions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · How do I make a claim with Expedia?*Make FastClaimService · Linear Warmup With Linear Decay · How do I get a human at Expedia immediately? (2025-2026) · Layer Normalization · AdamW · WordPiece · Dense Connections
