A Measure-Theoretic Characterization of Tight Language Models
Li Du, Lucas Torroba Hennigen, Tiago Pimentel, Clara Meister, Jason, Eisner, Ryan Cotterell

TL;DR
This paper uses measure theory to analyze language models, proving many are tight and do not leak probability mass onto infinite sequences, thus clarifying their theoretical properties.
Contribution
It provides a measure-theoretic framework for understanding tightness in language models and generalizes previous characterizations of this property.
Findings
Many popular language models are tight, preventing probability leakage.
The paper generalizes existing characterizations of tightness.
Provides a rigorous measure-theoretic foundation for language modeling.
Abstract
Language modeling, a central task in natural language processing, involves estimating a probability distribution over strings. In most cases, the estimated distribution sums to 1 over all finite strings. However, in some pathological cases, probability mass can ``leak'' onto the set of infinite sequences. In order to characterize the notion of leakage more precisely, this paper offers a measure-theoretic treatment of language modeling. We prove that many popular language model families are in fact tight, meaning that they will not leak in this sense. We also generalize characterizations of tightness proposed in previous works.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
