Beyond Self Attention: A Subquadratic Fourier Wavelet Transformer with   Multi Modal Fusion

Andrew Kiruluta; Andreas Lemos; Eric Lundy

arXiv:2111.15473·cs.CL·April 24, 2025

Beyond Self Attention: A Subquadratic Fourier Wavelet Transformer with Multi Modal Fusion

Andrew Kiruluta, Andreas Lemos, Eric Lundy

PDF

Open Access

TL;DR

This paper introduces a novel Fourier Wavelet Transformer with multi-modal fusion that replaces traditional attention, achieving subquadratic complexity and improved expressiveness for tasks like abstractive summarization.

Contribution

It presents a new spectral-based transformer model with Fourier Wavelet attention, integrating frequency and time transforms, and extends it to multi-modal data with enhanced efficiency.

Findings

01

Achieves subquadratic time and memory complexity.

02

Improves model expressiveness over traditional transformers.

03

Effective on PubMed abstractive summarization task.

Abstract

We revisit the use of spectral techniques to replaces the attention mechanism in Transformers through Fourier Transform based token mixing, and present a comprehensive and novel reformulation of this technique in next generation transformer models. We provide expanded literature context, detailed mathematical formulations of Fourier mixing and causal masking, and introduce a novel MultiDomain Fourier Wavelet Attention(MDFWA) that integrates frequency and time localized transforms to capture both global and local dependencies efficiently. We derive the complexity bounds, gradient formulas, and show that MDFWA achieves sub quadratic time and memory cost while improving expressive power. We validate our design on an abstractive summarization task using PubMed dataset, by enhancing the proposed approach with learned frequency bases, adaptive scale selection, and multi-modal extensions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · How do I make a claim with Expedia?*Make FastClaimService · Linear Warmup With Linear Decay · How do I get a human at Expedia immediately? (2025-2026) · Layer Normalization · AdamW · WordPiece · Dense Connections