Geometry of Semantics in Next-Token Prediction: How Optimization Implicitly Organizes Linguistic Representations

Yize Zhao; Christos Thrampoulidis

arXiv:2505.08348·cs.CL·October 9, 2025

Geometry of Semantics in Next-Token Prediction: How Optimization Implicitly Organizes Linguistic Representations

Yize Zhao, Christos Thrampoulidis

PDF

Open Access

TL;DR

This paper reveals how next-token prediction in language models implicitly organizes semantic information through matrix factorization, leading to emergent semantic hierarchies and interpretable categories.

Contribution

It introduces a mathematical framework showing how NTP optimization guides models to factor semantic matrices via SVD, uncovering semantic structures without explicit encoding.

Findings

01

Semantic concepts emerge early during training.

02

Models recover diverse semantic categories like entities and topics.

03

Singular value hierarchy reflects semantic granularity.

Abstract

We investigate how next-token prediction (NTP) optimization leads language models to extract and organize semantic structure from text. Our analysis, based on a tractable mathematical model and controlled synthetic data, reveals that NTP implicitly guides models to factor a centered support matrix encoding context-to-next-token co-occurrence patterns via singular value decomposition (SVD). While models never explicitly construct this matrix, learned word and context embeddings converge to its SVD factors, with singular vectors encoding latent semantic concepts through their sign patterns. We demonstrate that concepts corresponding to larger singular values are learned earlier during training, yielding a natural semantic hierarchy where broad categories emerge before fine-grained ones. This insight motivates orthant-based clustering, a method that combines concept signs to identify…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Semantic Web and Ontologies

MethodsSpectral Clustering