Lost in the Middle at Birth: An Exact Theory of Transformer Position Bias
Borun D Chowdhury

TL;DR
This paper reveals that the U-shaped performance bias in transformer models exists inherently at initialization due to geometric properties of the architecture, independent of training or positional encoding methods.
Contribution
It provides an exact theoretical model explaining the origin of the middle-context retrieval bias as an inherent architectural property of causal decoders with residual connections.
Findings
U-shape bias appears at initialization, before training.
The bias persists regardless of positional encoding methods like RoPE.
Empirical validation on untrained models confirms the theoretical predictions.
Abstract
The ``Lost in the Middle'' phenomenon -- a U-shaped performance curve where LLMs retrieve well from the beginning and end of a context but fail in the middle -- is widely attributed to learned Softmax artifacts or the distance-decay of positional encodings like RoPE. This paper makes a single, precise claim: \emph{the U-shape is already present at initialization, before any training or positional encoding takes effect.} It is an inherent geometric property of the causal decoder with residual connections. We model multi-layer causal attention as iterated powers of the Ces\`{a}ro matrix and derive the exact closed-form influence density in the continuous limit. Causal masking forces a logarithmic divergence of gradient influence at the start of the prompt (the Primacy Tail), while residual connections create an isolated anchor at the final token (the Recency Delta).…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Advanced Graph Neural Networks
