A Geometric Perspective on Next-Token Prediction in Large Language Models: Three Emerging Phases

Gianfranco Lombardo; Giuseppe Trimigno; Stefano Cagnoni

arXiv:2605.09011·cs.LG·May 12, 2026

A Geometric Perspective on Next-Token Prediction in Large Language Models: Three Emerging Phases

Gianfranco Lombardo, Giuseppe Trimigno, Stefano Cagnoni

PDF

TL;DR

This paper explores the geometric evolution of predictive information in large language models across layers, revealing three distinct phases that characterize how information is processed and refined.

Contribution

It introduces a novel geometric diagnostic framework to analyze the evolution of predictive information in LLMs, identifying three emergent phases across different models.

Findings

01

Predictive information resides in a dominant subspace that evolves through three phases.

02

Three geometric phases are identified: Seeding Multiplexing, Hoisting Overriding, and Focal Convergence.

03

Deeper models mainly expand candidate disambiguation capacity rather than increasing overall information.

Abstract

We investigate the geometry of predictive information across the layers of large language models (LLMs). We repurpose representation lenses-learned affine maps trained to predict the next token from intermediate residual streams-as geometric diagnostic tools. Rather than asking what the model predicts at each layer, we ask where predictive information resides and how it evolves across depth. We define at each layer a predictive readout subspace as the dominant k-dimensional singular subspace of such a map on the d-dimensional residual stream (where k is a resolution parameter), and track its trajectory on the Grassmann manifold as a similarity profile across layers. The profile is well described by unimodal distributions exhibiting a rise, near-plateau, and descent; varying k from 1% to 50% of d traces a Pareto frontier between visibility and energy retention, yet the same structure…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.