Position Information in Transformers: An Overview

Philipp Dufter; Martin Schmitt; Hinrich Sch\"utze

arXiv:2102.11090·cs.CL·September 10, 2021·1 cites

Position Information in Transformers: An Overview

Philipp Dufter, Martin Schmitt, Hinrich Sch\"utze

PDF

Open Access

TL;DR

This paper surveys various methods of incorporating position information into Transformer models, highlighting their importance for language understanding and providing a systematic comparison to guide future research.

Contribution

It offers a comprehensive overview and unified framework for existing position encoding methods in Transformers, aiding in method comparison and application selection.

Findings

01

Position encoding is crucial for Transformer performance.

02

Multiple approaches to position information exist with different trade-offs.

03

The survey guides future research directions in position encoding.

Abstract

Transformers are arguably the main workhorse in recent Natural Language Processing research. By definition a Transformer is invariant with respect to reordering of the input. However, language is inherently sequential and word order is essential to the semantics and syntax of an utterance. In this article, we provide an overview and theoretical comparison of existing methods to incorporate position information into Transformer models. The objectives of this survey are to (1) showcase that position information in Transformer is a vibrant and extensive research area; (2) enable the reader to compare existing methods by providing a unified notation and systematization of different approaches along important model dimensions; (3) indicate what characteristics of an application should be taken into account when selecting a position encoding; (4) provide stimuli for future research.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems · Topic Modeling

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Dropout · Layer Normalization · Attention Is All You Need · Dense Connections · Softmax · Adam