A Measure-Theoretic Characterization of Tight Language Models

Li Du; Lucas Torroba Hennigen; Tiago Pimentel; Clara Meister; Jason; Eisner; Ryan Cotterell

arXiv:2212.10502·cs.CL·August 23, 2023

A Measure-Theoretic Characterization of Tight Language Models

Li Du, Lucas Torroba Hennigen, Tiago Pimentel, Clara Meister, Jason, Eisner, Ryan Cotterell

PDF

TL;DR

This paper uses measure theory to analyze language models, proving many are tight and do not leak probability mass onto infinite sequences, thus clarifying their theoretical properties.

Contribution

It provides a measure-theoretic framework for understanding tightness in language models and generalizes previous characterizations of this property.

Findings

01

Many popular language models are tight, preventing probability leakage.

02

The paper generalizes existing characterizations of tightness.

03

Provides a rigorous measure-theoretic foundation for language modeling.

Abstract

Language modeling, a central task in natural language processing, involves estimating a probability distribution over strings. In most cases, the estimated distribution sums to 1 over all finite strings. However, in some pathological cases, probability mass can ``leak'' onto the set of infinite sequences. In order to characterize the notion of leakage more precisely, this paper offers a measure-theoretic treatment of language modeling. We prove that many popular language model families are in fact tight, meaning that they will not leak in this sense. We also generalize characterizations of tightness proposed in previous works.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.