HoPE: A Novel Positional Encoding Without Long-Term Decay for Enhanced Context Awareness and Extrapolation
Yuhan Chen, Ang Lv, Jian Luan, Bin Wang, Wei Liu

TL;DR
HoPE introduces a new positional encoding that removes long-term decay, improving context awareness and extrapolation in large language models by focusing on high-frequency signals and optimizing position and semantic components.
Contribution
The paper proposes HoPE, a novel positional encoding that breaks the long-term decay principle, enhancing model extrapolation and context understanding in LLMs.
Findings
HoPE outperforms traditional encodings in extrapolation tasks.
Models with HoPE show improved context awareness.
HoPE removes limitations caused by long-term decay in positional encoding.
Abstract
Many positional encodings (PEs) are designed to exhibit long-term decay, based on an entrenched and long-standing inductive opinion: tokens farther away from the current position carry less relevant information. We argue that long-term decay is outdated in the era of LLMs, as LLMs are now applied to tasks demanding precise retrieval of in-context information from arbitrary positions. Firstly, we present empirical analyses on various PEs, demonstrating that models inherently learn attention with only a local-decay pattern while forming a U-shape pattern globally, contradicting the principle of long-term decay. Furthermore, we conduct a detailed analysis of rotary position encoding (RoPE, a prevalent relative positional encoding in LLMs), and found that the U-shape attention is caused by some learned components, which are also the key factor limiting RoPE's expressiveness and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsIndoor and Outdoor Localization Technologies · Video Surveillance and Tracking Methods · IoT-based Smart Home Systems
MethodsSoftmax · Attention Is All You Need
