Transformer with Fourier Integral Attentions

Tan Nguyen; Minh Pham; Tam Nguyen; Khai Nguyen; Stanley J.; Osher; Nhat Ho

arXiv:2206.00206·cs.LG·June 2, 2022·1 cites

Transformer with Fourier Integral Attentions

Tan Nguyen, Minh Pham, Tam Nguyen, Khai Nguyen, Stanley J., Osher, Nhat Ho

PDF

Open Access

TL;DR

This paper introduces FourierFormer, a novel transformer model that replaces dot-product attention with Fourier integral kernels, enabling better approximation of data distributions and improving accuracy in language and image tasks.

Contribution

The paper proposes FourierFormer, a transformer variant using Fourier integral kernels that automatically capture feature dependencies without tuning covariance matrices.

Findings

01

FourierFormers outperform baseline transformers in language modeling.

02

FourierFormers achieve higher accuracy in image classification.

03

They reduce redundancy between attention heads.

Abstract

Multi-head attention empowers the recent success of transformers, the state-of-the-art models that have achieved remarkable success in sequence modeling and beyond. These attention mechanisms compute the pairwise dot products between the queries and keys, which results from the use of unnormalized Gaussian kernels with the assumption that the queries follow a mixture of Gaussian distribution. There is no guarantee that this assumption is valid in practice. In response, we first interpret attention in transformers as a nonparametric kernel regression. We then propose the FourierFormer, a new class of transformers in which the dot-product kernels are replaced by the novel generalized Fourier integral kernels. Different from the dot-product kernels, where we need to choose a good covariance matrix to capture the dependency of the features of data, the generalized Fourier integral kernels…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text and Document Classification Technologies · Domain Adaptation and Few-Shot Learning