PADRe: A Unifying Polynomial Attention Drop-in Replacement for Efficient   Vision Transformer

Pierre-David Letourneau; Manish Kumar Singh; Hsin-Pai Cheng; Shizhong; Han; Yunxiao Shi; Dalton Jones; Matthew Harper Langston; Hong Cai; Fatih; Porikli

arXiv:2407.11306·cs.CV·July 17, 2024

PADRe: A Unifying Polynomial Attention Drop-in Replacement for Efficient Vision Transformer

Pierre-David Letourneau, Manish Kumar Singh, Hsin-Pai Cheng, Shizhong, Han, Yunxiao Shi, Dalton Jones, Matthew Harper Langston, Hong Cai, Fatih, Porikli

PDF

Open Access

TL;DR

PADRe introduces a polynomial-based attention mechanism that unifies various efficient attention methods, offering faster computation with comparable accuracy across vision tasks.

Contribution

It proposes a unifying polynomial attention framework that replaces traditional self-attention, improving efficiency and encompassing recent alternative attention mechanisms.

Findings

01

PADRe is 11x to 43x faster than standard self-attention.

02

Maintains similar accuracy to traditional self-attention.

03

Effective across diverse vision tasks like classification and detection.

Abstract

We present Polynomial Attention Drop-in Replacement (PADRe), a novel and unifying framework designed to replace the conventional self-attention mechanism in transformer models. Notably, several recent alternative attention mechanisms, including Hyena, Mamba, SimA, Conv2Former, and Castling-ViT, can be viewed as specific instances of our PADRe framework. PADRe leverages polynomial functions and draws upon established results from approximation theory, enhancing computational efficiency without compromising accuracy. PADRe's key components include multiplicative nonlinearities, which we implement using straightforward, hardware-friendly operations such as Hadamard products, incurring only linear computational and memory costs. PADRe further avoids the need for using complex functions such as Softmax, yet it maintains comparable or superior accuracy compared to traditional self-attention.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCCD and CMOS Imaging Sensors · Image Enhancement Techniques · Advanced Memory and Neural Computing

MethodsAttention Is All You Need · Softmax