Mixing Times of Glauber Dynamics on Masked Language Models
Suvadip Sana, Sami Wolf, Neer Mehta, Alina Shah, Aitzaz Shaikh, Janna Goodman, Lionel Levine

TL;DR
This paper models the iterative token resampling in masked language models as a Glauber dynamics Markov chain, analyzing its mixing times and stationary behavior to understand the global distributional effects of local conditionals.
Contribution
It introduces a rectangle test to certify incompatibility of MLM conditionals, provides theoretical bounds on mixing times, and empirically demonstrates phase transitions and semantic basin structures.
Findings
MLM conditionals are often incompatible, verified by the rectangle test.
High-temperature regimes have rapid mixing with O(n log n) time.
Low-temperature regimes exhibit metastability with slow escape from semantic basins.
Abstract
Masked language models (MLMs) define local conditional distributions over tokens but do not, in general, correspond to any consistent joint distribution over sequences. This raises a fundamental question: what global distributional behavior is induced when such conditionals are used iteratively for generation? We address this question by modeling iterative masked-token resampling as a Glauber dynamics Markov chain on the discrete space of token sequences. We first show that MLM conditionals are intrinsically incompatible: we introduce a rectangle test that certifies this incompatibility and empirically verify its prevalence across modern MLMs. We then provide a theoretical analysis of the induced Markov chain. Under bounded cross-token influence, we establish a high-temperature contraction result implying mixing time where is the sequence length. In contrast, we prove…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
