Graph Laplacian Wavelet Transformer via Learnable Spectral Decomposition
Andrew Kiruluta, Eric Lundy, Priscilla Burity

TL;DR
The paper presents the Graph Wavelet Transformer, a new model that replaces quadratic self-attention with a learnable spectral decomposition using graph wavelets, improving efficiency and interpretability for structured language tasks.
Contribution
It introduces a novel graph wavelet transform-based architecture that offers an efficient, interpretable alternative to traditional self-attention mechanisms in sequence modeling.
Findings
Spectral decomposition provides an efficient alternative to quadratic self-attention.
The model is interpretable and effective for graph-structured sequence tasks.
Learnable spectral methods improve computational efficiency.
Abstract
Existing sequence to sequence models for structured language tasks rely heavily on the dot product self attention mechanism, which incurs quadratic complexity in both computation and memory for input length N. We introduce the Graph Wavelet Transformer (GWT), a novel architecture that replaces this bottleneck with a learnable, multi scale wavelet transform defined over an explicit graph Laplacian derived from syntactic or semantic parses. Our analysis shows that multi scale spectral decomposition offers an interpretable, efficient, and expressive alternative to quadratic self attention for graph structured sequence modeling.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Dense Connections · Dropout · Layer Normalization · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Softmax · Absolute Position Encodings
