Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines
Yuchen Li, Alexandre Kirchmeyer, Aashay Mehta, Yilong Qin, Boris, Dadachev, Kishore Papineni, Sanjiv Kumar, Andrej Risteski

TL;DR
This paper develops a mathematical framework for Generative Masked Language Models (GMLMs), analyzes their capabilities and limitations, and demonstrates practical improvements in machine translation speed with minimal quality loss.
Contribution
It provides the first comprehensive theoretical analysis of GMLMs, proposes an iterative decoding method, and offers practical guidelines for model design and optimization.
Findings
Achieved 2-3x speedup in machine translation with minimal quality loss.
Developed a mathematical framework analyzing sample complexity and inference trade-offs.
Provided empirical recommendations and insights into error modes of GMLMs.
Abstract
Autoregressive language models are the currently dominant paradigm for text generation, but they have some fundamental limitations that cannot be remedied by scale-for example inherently sequential and unidirectional generation. While alternate classes of models have been explored, we have limited mathematical understanding of their fundamental power and limitations. In this paper we focus on Generative Masked Language Models (GMLMs), a non-autoregressive paradigm in which we train a model to fit conditional probabilities of the data distribution via masking, which are subsequently used as inputs to a Markov Chain to draw samples from the model, These models empirically strike a promising speed-quality trade-off as each step can be typically parallelized by decoding the entire sequence in parallel. We develop a mathematical framework for analyzing and improving such models which sheds…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Natural Language Processing Techniques
MethodsGated Linear Unit · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Byte Pair Encoding · Softmax · Dense Connections · Inverse Square Root Schedule · Dropout · Linear Layer · Attention Dropout
