Learnable Fourier Features for Multi-Dimensional Spatial Positional   Encoding

Yang Li; Si Si; Gang Li; Cho-Jui Hsieh; Samy Bengio

arXiv:2106.02795·cs.LG·November 10, 2021·30 cites

Learnable Fourier Features for Multi-Dimensional Spatial Positional Encoding

Yang Li, Si Si, Gang Li, Cho-Jui Hsieh, Samy Bengio

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a learnable Fourier feature-based positional encoding for attention models, enhancing spatial understanding and outperforming existing methods in accuracy and convergence speed.

Contribution

It proposes a novel trainable positional encoding method using learnable Fourier features for multi-dimensional spatial data in attention models.

Findings

01

Outperforms existing positional encoding methods in benchmarks.

02

Improves accuracy in spatial tasks.

03

Enables faster convergence during training.

Abstract

Attentional mechanisms are order-invariant. Positional encoding is a crucial component to allow attention-based deep model architectures such as Transformer to address sequences or images where the position of information matters. In this paper, we propose a novel positional encoding method based on learnable Fourier features. Instead of hard-coding each position as a token or a vector, we represent each position, which can be multi-dimensional, as a trainable encoding based on learnable Fourier feature mapping, modulated with a multi-layer perceptron. The representation is particularly advantageous for a spatial multi-dimensional position, e.g., pixel positions on an image, where $L_{2}$ distances or more complex positional relationships need to be captured. Our experiments based on several public benchmark tasks show that our learnable Fourier feature representation for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

willGuimont/learnable_fourier_positional_encoding
pytorch

Videos

Learnable Fourier Features for Multi-dimensional Spatial Positional Encoding· slideslive

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Domain Adaptation and Few-Shot Learning

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Label Smoothing · Residual Connection · Dense Connections