SchoenbAt: Rethinking Attention with Polynomial basis

Yuhan Guo; Lizhong Ding; Yuwan Yang; Xuewei Guo

arXiv:2505.12252·cs.LG·May 20, 2025

SchoenbAt: Rethinking Attention with Polynomial basis

Yuhan Guo, Lizhong Ding, Yuwan Yang, Xuewei Guo

PDF

Open Access 1 Repo

TL;DR

SchoenbAt introduces a polynomial basis approach for kernelized attention, leveraging Schoenberg's theorem and random Maclaurin features to improve efficiency while maintaining accuracy in sequence modeling tasks.

Contribution

The paper proposes SchoenbAt, a novel polynomial basis-based kernelized attention method using Schoenberg's theorem, expanding beyond Fourier bases and providing theoretical and empirical validation.

Findings

01

SchoenbAt achieves faster computation compared to existing methods.

02

It maintains competitive accuracy in real-world datasets.

03

Theoretical analysis confirms unbiasedness and error bounds.

Abstract

Kernelized attention extends the attention mechanism by modeling sequence correlations through kernel functions, making significant progresses in optimizing attention. Under the guarantee of harmonic analysis theory, kernel functions can be expanded with basis functions, inspiring random feature-based approaches to enhance the efficiency of kernelized attention while maintaining predictive performance. However, current random feature-based works are limited to the Fourier basis expansions under Bochner's theorem. We propose Schoenberg's theorem-based attention (SchoenbAt), which approximates dot-product kernelized attention with the polynomial basis under Schoenberg's theorem via random Maclaurin features and applies a two-stage regularization to constrain the input space and restore the output scale, acting as a drop-in replacement of dot-product kernelized attention. Our theoretical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

1911416-GuoYuhan/SchoenbAt
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis

MethodsSoftmax · Attention Is All You Need · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings