Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding
Jiajun Zhu, Peihao Wang, Ruisi Cai, Jason D. Lee, Pan Li, Zhangyang Wang

TL;DR
This paper introduces TAPE, a new framework for positional encoding in transformers that incorporates sequence content to improve long-range reasoning and task adaptability, with provable benefits and practical efficiency.
Contribution
TAPE provides a dynamic, content-aware positional encoding method that enhances transformer performance and reasoning capabilities, surpassing traditional fixed-position techniques.
Findings
Improves long-context reasoning in language models.
Enhances arithmetic reasoning and retrieval tasks.
Achieves superior performance over existing positional encodings.
Abstract
Transformers rely on both content-based and position-based addressing mechanisms to make predictions, but existing positional encoding techniques often diminish the effectiveness of position-based addressing. Many current methods enforce rigid patterns in attention maps, limiting the ability to model long-range dependencies and adapt to diverse tasks. Additionally, most positional encodings are learned as general biases, lacking the specialization required for different instances within a dataset. To address this, we propose con\textbf{T}extualized equivari\textbf{A}nt \textbf{P}osition \textbf{E}ncoding (\textbf{TAPE}), a novel framework that enhances positional embeddings by incorporating sequence content across layers. TAPE introduces dynamic, context-aware positional encodings, overcoming the constraints of traditional fixed patterns. We show that TAPE can provably facilitate LLM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems · Topic Modeling
MethodsSoftmax · Attention Is All You Need
