Exploring Transformer Extrapolation
Zhen Qin, Yiran Zhong, Hui Deng

TL;DR
This paper investigates the mathematical conditions under which transformers can extrapolate to longer sequences using Relative Positional Encodings, introducing a new theoretical measure and validating findings through extensive language modeling experiments.
Contribution
It provides a thorough analysis of RPEs for length extrapolation, deriving conditions for success and introducing the Theoretical Receptive Field as a new measurement tool.
Findings
Exponential convergence of RPE series guarantees length extrapolation.
Derived a new Theoretical Receptive Field (TRF) for RPEs.
Empirical validation across multiple datasets confirms the theoretical conditions.
Abstract
Length extrapolation has attracted considerable attention recently since it allows transformers to be tested on longer sequences than those used in training. Previous research has shown that this property can be attained by using carefully designed Relative Positional Encodings (RPEs). While these methods perform well on a variety of corpora, the conditions for length extrapolation have yet to be investigated. This paper attempts to determine what types of RPEs allow for length extrapolation through a thorough mathematical and empirical analysis. We discover that a transformer is certain to possess this property as long as the series that corresponds to the RPE's exponential converges. Two practices are derived from the conditions and examined in language modeling tasks on a variety of corpora. As a bonus from the conditions, we derive a new Theoretical Receptive Field (TRF) to measure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
