VORT: Adaptive Power-Law Memory for NLP Transformers

Nabil Mlaiki

arXiv:2605.08966·cs.LG·May 12, 2026

VORT: Adaptive Power-Law Memory for NLP Transformers

Nabil Mlaiki

PDF

TL;DR

VORT introduces a learnable power-law memory architecture for NLP transformers, better capturing long-range dependencies by approximating fractional kernels with sum-of-exponentials for efficient, adaptive retrieval.

Contribution

The paper proposes a novel memory mechanism using fractional power-law kernels with SOE approximation, enabling adaptive, non-Markovian token retention in transformers.

Findings

01

VORT outperforms prior models on Zipf-distributed retrieval tasks.

02

The architecture effectively captures long-range dependencies in language.

03

Synthetic experiments demonstrate the advantage of power-law kernels over prior-matching methods.

Abstract

Standard Transformers impose near-exponential decay on the influence of distant tokens, conflicting with the power-law structure of long-range dependencies in natural language. We introduce the \emph{Variable-Order Retention Transformer} (\VORT{}), a memory architecture in which each ingested token is assigned a learnable fractional order \alpha_i\in[\delta,1] that governs a Gr\"unwald--Letnikov power-law retention kernel. Because the fractional weighted sum is non-Markovian, we approximate it through a sum-of-exponentials (SOE) decomposition computed by Gauss--Laguerre quadrature on a Laplace-type integral representation of the kernel weights. Each exponential component admits a one-step Markovian recurrence at O(Sd_v) per step, where S=O(\log(T/\varepsilon)) terms suffice for \varepsilon-uniform accuracy on horizon [1,T]. Retrieval is keyed and associative via a linear-attention…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.