SchoenbAt: Rethinking Attention with Polynomial basis
Yuhan Guo, Lizhong Ding, Yuwan Yang, Xuewei Guo

TL;DR
SchoenbAt introduces a polynomial basis approach for kernelized attention, leveraging Schoenberg's theorem and random Maclaurin features to improve efficiency while maintaining accuracy in sequence modeling tasks.
Contribution
The paper proposes SchoenbAt, a novel polynomial basis-based kernelized attention method using Schoenberg's theorem, expanding beyond Fourier bases and providing theoretical and empirical validation.
Findings
SchoenbAt achieves faster computation compared to existing methods.
It maintains competitive accuracy in real-world datasets.
Theoretical analysis confirms unbiasedness and error bounds.
Abstract
Kernelized attention extends the attention mechanism by modeling sequence correlations through kernel functions, making significant progresses in optimizing attention. Under the guarantee of harmonic analysis theory, kernel functions can be expanded with basis functions, inspiring random feature-based approaches to enhance the efficiency of kernelized attention while maintaining predictive performance. However, current random feature-based works are limited to the Fourier basis expansions under Bochner's theorem. We propose Schoenberg's theorem-based attention (SchoenbAt), which approximates dot-product kernelized attention with the polynomial basis under Schoenberg's theorem via random Maclaurin features and applies a two-stage regularization to constrain the input space and restore the output scale, acting as a drop-in replacement of dot-product kernelized attention. Our theoretical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis
MethodsSoftmax · Attention Is All You Need · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
