Directed Metric Structures arising in Large Language Models
St\'ephane Gaubert, Yiannis Vlassopoulos

TL;DR
This paper uncovers a mathematical metric structure underlying large language models' probability distributions, revealing a tropical geometric framework that encodes text relationships and extensions.
Contribution
It introduces a novel metric polyhedron framework for analyzing text in language models, connecting probability, geometry, and category theory without explicit reliance on the latter.
Findings
Text extensions form isometric polyhedra
Text vectors can be approximated as Boltzmann weighted combinations
The metric structure relates to the Isbell completion and lattice closure
Abstract
Large Language Models are transformer neural networks which are trained to produce a probability distribution on the possible next words to given texts in a corpus, in such a way that the most likely word predicted is the actual word in the training text. In this paper we find what is the mathematical structure defined by such conditional probability distributions of text extensions. Changing the view point from probabilities to -log probabilities we observe that the subtext order is completely encoded in a metric structure defined on the space of texts , by -log probabilities. We then construct a metric polyhedron and an isometric embedding (called Yoneda embedding) of into such that texts map to generators of certain special extremal rays. We explain that is a (tropical) linear span of these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOpinion Dynamics and Social Influence
