Toward a Thermodynamics of Meaning
Jonathan Scott Enderle

TL;DR
This paper proposes a thermodynamic framework to understand how language models learn and represent meaning, suggesting they acquire structural world facts through text as an equilibrium system, explaining their success in cooccurrence prediction.
Contribution
It introduces a novel thermodynamic model linking text and the world, providing insights into what language models learn and their limitations.
Findings
Language models learn structural facts about the world.
Text can be modeled as an equilibrium thermodynamic system.
Cooccurrence prediction effectively captures aspects of meaning.
Abstract
As language models such as GPT-3 become increasingly successful at generating realistic text, questions about what purely text-based modeling can learn about the world have become more urgent. Is text purely syntactic, as skeptics argue? Or does it in fact contain some semantic information that a sufficiently sophisticated language model could use to learn about the world without any additional inputs? This paper describes a new model that suggests some qualified answers to those questions. By theorizing the relationship between text and the world it describes as an equilibrium relationship between a thermodynamic system and a much larger reservoir, this paper argues that even very simple language models do learn structural facts about the world, while also proposing relatively precise limits on the nature and extent of those facts. This perspective promises not only to answer questions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Language and cultural evolution · Explainable Artificial Intelligence (XAI)
MethodsLinear Layer · Cosine Annealing · Dense Connections · Dropout · Layer Normalization · Linear Warmup With Cosine Annealing · Attention Dropout · Byte Pair Encoding · Weight Decay · Multi-Head Attention
