Latent Semantic Manifolds in Large Language Models

Mohamed A. Mabrok

arXiv:2603.22301·cs.LG·March 25, 2026

Latent Semantic Manifolds in Large Language Models

Mohamed A. Mabrok

PDF

Open Access

TL;DR

This paper introduces a geometric framework for understanding how large language models encode semantics in continuous spaces, revealing fundamental limits and properties of their internal representations.

Contribution

It develops a mathematical model interpreting LLM hidden states as points on a Riemannian manifold, and proves key theorems about semantic distortion and expressibility gaps.

Findings

01

Universal hourglass intrinsic dimension profiles across models

02

Linear scaling law for the semantic expressibility gap

03

Persistent boundary-proximal representations invariant to scale

Abstract

Large Language Models (LLMs) perform internal computations in continuous vector spaces yet produce discrete tokens -- a fundamental mismatch whose geometric consequences remain poorly understood. We develop a mathematical framework that interprets LLM hidden states as points on a latent semantic manifold: a Riemannian submanifold equipped with the Fisher information metric, where tokens correspond to Voronoi regions partitioning the manifold. We define the expressibility gap, a geometric measure of the semantic distortion from vocabulary discretization, and prove two theorems: a rate-distortion lower bound on distortion for any finite vocabulary, and a linear volume scaling law for the expressibility gap via the coarea formula. We validate these predictions across six transformer architectures (124M-1.5B parameters), confirming universal hourglass intrinsic dimension profiles, smooth…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Artificial Intelligence in Healthcare and Education