PADRe: A Unifying Polynomial Attention Drop-in Replacement for Efficient Vision Transformer
Pierre-David Letourneau, Manish Kumar Singh, Hsin-Pai Cheng, Shizhong, Han, Yunxiao Shi, Dalton Jones, Matthew Harper Langston, Hong Cai, Fatih, Porikli

TL;DR
PADRe introduces a polynomial-based attention mechanism that unifies various efficient attention methods, offering faster computation with comparable accuracy across vision tasks.
Contribution
It proposes a unifying polynomial attention framework that replaces traditional self-attention, improving efficiency and encompassing recent alternative attention mechanisms.
Findings
PADRe is 11x to 43x faster than standard self-attention.
Maintains similar accuracy to traditional self-attention.
Effective across diverse vision tasks like classification and detection.
Abstract
We present Polynomial Attention Drop-in Replacement (PADRe), a novel and unifying framework designed to replace the conventional self-attention mechanism in transformer models. Notably, several recent alternative attention mechanisms, including Hyena, Mamba, SimA, Conv2Former, and Castling-ViT, can be viewed as specific instances of our PADRe framework. PADRe leverages polynomial functions and draws upon established results from approximation theory, enhancing computational efficiency without compromising accuracy. PADRe's key components include multiplicative nonlinearities, which we implement using straightforward, hardware-friendly operations such as Hadamard products, incurring only linear computational and memory costs. PADRe further avoids the need for using complex functions such as Softmax, yet it maintains comparable or superior accuracy compared to traditional self-attention.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCCD and CMOS Imaging Sensors · Image Enhancement Techniques · Advanced Memory and Neural Computing
MethodsAttention Is All You Need · Softmax
