KDEformer: Accelerating Transformers via Kernel Density Estimation

Amir Zandieh; Insu Han; Majid Daliri; Amin Karbasi

arXiv:2302.02451·cs.LG·June 30, 2023·5 cites

KDEformer: Accelerating Transformers via Kernel Density Estimation

Amir Zandieh, Insu Han, Majid Daliri, Amin Karbasi

PDF

Open Access 1 Repo 1 Video

TL;DR

KDEformer introduces a novel approach to approximate the attention mechanism in transformers using kernel density estimation, significantly reducing computation time while maintaining accuracy.

Contribution

The paper presents KDEformer, a method that accelerates transformer attention by reducing it to a KDE problem with provable spectral bounds, outperforming prior methods.

Findings

01

Achieves over 4x speedup in BigGAN image generation.

02

Provides over 18x speedup in ImageNet classification with minimal accuracy loss.

03

Outperforms existing attention approximation methods in accuracy, memory, and runtime.

Abstract

Dot-product attention mechanism plays a crucial role in modern deep architectures (e.g., Transformer) for sequence modeling, however, na\"ive exact computation of this model incurs quadratic time and memory complexities in sequence length, hindering the training of long-sequence models. Critical bottlenecks are due to the computation of partition functions in the denominator of softmax function as well as the multiplication of the softmax matrix with the matrix of values. Our key observation is that the former can be reduced to a variant of the kernel density estimation (KDE) problem, and an efficient KDE solver can be further utilized to accelerate the latter via subsampling-based fast matrix products. Our proposed KDEformer can approximate the attention in sub-quadratic time with provable spectral norm bounds, while all prior results merely provide entry-wise error bounds.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

majid-daliri/kdeformer
pytorchOfficial

Videos

KDEformer: Accelerating Transformers via Kernel Density Estimation· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Medical Image Segmentation Techniques

MethodsAttention Is All You Need · BigGAN · *Communicated@Fast*How Do I Communicate to Expedia? · Non-Local Operation · Batch Normalization · Feedforward Network · 1x1 Convolution · GAN Hinge Loss · Adam · Projection Discriminator