Learning to Encode Position for Transformer with Continuous Dynamical   Model

Xuanqing Liu; Hsiang-Fu Yu; Inderjit Dhillon; Cho-Jui Hsieh

arXiv:2003.09229·cs.LG·March 23, 2020·24 cites

Learning to Encode Position for Transformer with Continuous Dynamical Model

Xuanqing Liu, Hsiang-Fu Yu, Inderjit Dhillon, Cho-Jui Hsieh

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper proposes a learnable position encoding method for Transformers using Neural ODEs, enabling flexible, extrapolatable position representations that improve performance on language tasks.

Contribution

Introduces a continuous dynamical model-based position encoding for Transformers, addressing limitations of sinusoidal and embedding methods with a learnable, extrapolatable approach.

Findings

01

Consistent performance improvements on translation tasks

02

Enhanced flexibility and length extrapolation capabilities

03

Effective modeling of position evolution as a dynamical system

Abstract

We introduce a new way of learning to encode position information for non-recurrent models, such as Transformer models. Unlike RNN and LSTM, which contain inductive bias by loading the input tokens sequentially, non-recurrent models are less sensitive to position. The main reason is that position information among input units is not inherently encoded, i.e., the models are permutation equivalent; this problem justifies why all of the existing models are accompanied by a sinusoidal encoding/embedding layer at the input. However, this solution has clear limitations: the sinusoidal encoding is not flexible enough as it is manually designed and does not contain any learnable parameters, whereas the position embedding restricts the maximum length of input sequences. It is thus desirable to design a new position layer that contains learnable parameters to adjust to different datasets and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xuanqing94/FLOATER
pytorch

Videos

Learning to Encode Position for Transformer with Continuous Dynamical Model· slideslive

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · 1x1 Convolution · Convolution · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia?