Graph Laplacian Wavelet Transformer via Learnable Spectral Decomposition

Andrew Kiruluta; Eric Lundy; Priscilla Burity

arXiv:2505.07862·cs.CL·May 14, 2025

Graph Laplacian Wavelet Transformer via Learnable Spectral Decomposition

Andrew Kiruluta, Eric Lundy, Priscilla Burity

PDF

TL;DR

The paper presents the Graph Wavelet Transformer, a new model that replaces quadratic self-attention with a learnable spectral decomposition using graph wavelets, improving efficiency and interpretability for structured language tasks.

Contribution

It introduces a novel graph wavelet transform-based architecture that offers an efficient, interpretable alternative to traditional self-attention mechanisms in sequence modeling.

Findings

01

Spectral decomposition provides an efficient alternative to quadratic self-attention.

02

The model is interpretable and effective for graph-structured sequence tasks.

03

Learnable spectral methods improve computational efficiency.

Abstract

Existing sequence to sequence models for structured language tasks rely heavily on the dot product self attention mechanism, which incurs quadratic complexity in both computation and memory for input length N. We introduce the Graph Wavelet Transformer (GWT), a novel architecture that replaces this bottleneck with a learnable, multi scale wavelet transform defined over an explicit graph Laplacian derived from syntactic or semantic parses. Our analysis shows that multi scale spectral decomposition offers an interpretable, efficient, and expressive alternative to quadratic self attention for graph structured sequence modeling.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Dense Connections · Dropout · Layer Normalization · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Softmax · Absolute Position Encodings