Kinetic theory for Transformers and the lost-in-the-middle phenomenon

Mitia Duerinckx; Borjan Geshkovski; Stefano Rossi

arXiv:2605.09213·math.AP·May 12, 2026

Kinetic theory for Transformers and the lost-in-the-middle phenomenon

Mitia Duerinckx, Borjan Geshkovski, Stefano Rossi

PDF

TL;DR

This paper models causal self-attention in Transformers as a particle system, deriving a mean-field limit and explaining the 'lost-in-the-middle' phenomenon through rigorous correlation analysis.

Contribution

It introduces a novel particle system framework for causal self-attention and provides a quantitative analysis of the 'lost-in-the-middle' effect in token retrieval.

Findings

01

Derived a mean-field limit for the model.

02

Provided a closed-form solution for the correlation equation.

03

Rigorous explanation of the 'lost-in-the-middle' phenomenon.

Abstract

We study causal self-attention dynamics -- a toy model for decoder Transformers -- which we interpret as a non-exchangeable interacting particle system. Adapting cumulant expansions to the triangular causal dependency structure of the model, and appealing to non-hierarchical methods to estimate correlations using Glauber calculus, we prove a quantitative mean-field limit result and a next-order characterization of correlations. For iid uniformly distributed tokens, the limiting correlation equation can be solved in closed form and we obtain a rigorous explanation of the empirically observed \emph{lost-in-the-middle} phenomenon: the token retrieval profile, as a function of the source position in the prompt, is $U$ -shaped, with primacy, recency, and a unique interior minimum under an explicit smallness condition.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.