PoPE: Legendre Orthogonal Polynomials Based Position Encoding for Large   Language Models

Arpit Aggarwal

arXiv:2405.04585·cs.CL·May 9, 2024

PoPE: Legendre Orthogonal Polynomials Based Position Encoding for Large Language Models

Arpit Aggarwal

PDF

Open Access

TL;DR

This paper introduces PoPE, a novel position encoding method using Legendre orthogonal polynomials, which improves transformer performance and convergence by addressing limitations of sinusoidal-based encodings.

Contribution

The paper proposes PoPE, a new position encoding technique based on Legendre polynomials, demonstrating superior performance and faster convergence in transformer models.

Findings

01

PoPE outperforms baseline models on English-German translation.

02

PoPE significantly accelerates model convergence.

03

Theoretical analysis explains advantages of orthogonal polynomial encoding.

Abstract

There are several improvements proposed over the baseline Absolute Positional Encoding (APE) method used in original transformer. In this study, we aim to investigate the implications of inadequately representing positional encoding in higher dimensions on crucial aspects of the attention mechanism, the model's capacity to learn relative positional information, and the convergence of models, all stemming from the choice of sinusoidal basis functions. Through a combination of theoretical insights and empirical analyses, we elucidate how these challenges extend beyond APEs and may adversely affect the performance of Relative Positional Encoding (RPE) methods, such as Rotatory Positional Encoding (RoPE). Subsequently, we introduce an innovative solution termed Orthogonal Polynomial Based Positional Encoding (PoPE) to address some of the limitations associated with existing methods. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems