Learnable Fourier Features for Multi-Dimensional Spatial Positional Encoding
Yang Li, Si Si, Gang Li, Cho-Jui Hsieh, Samy Bengio

TL;DR
This paper introduces a learnable Fourier feature-based positional encoding for attention models, enhancing spatial understanding and outperforming existing methods in accuracy and convergence speed.
Contribution
It proposes a novel trainable positional encoding method using learnable Fourier features for multi-dimensional spatial data in attention models.
Findings
Outperforms existing positional encoding methods in benchmarks.
Improves accuracy in spatial tasks.
Enables faster convergence during training.
Abstract
Attentional mechanisms are order-invariant. Positional encoding is a crucial component to allow attention-based deep model architectures such as Transformer to address sequences or images where the position of information matters. In this paper, we propose a novel positional encoding method based on learnable Fourier features. Instead of hard-coding each position as a token or a vector, we represent each position, which can be multi-dimensional, as a trainable encoding based on learnable Fourier feature mapping, modulated with a multi-layer perceptron. The representation is particularly advantageous for a spatial multi-dimensional position, e.g., pixel positions on an image, where distances or more complex positional relationships need to be captured. Our experiments based on several public benchmark tasks show that our learnable Fourier feature representation for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Domain Adaptation and Few-Shot Learning
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Label Smoothing · Residual Connection · Dense Connections
