Probabilistically Masked Language Model Capable of Autoregressive   Generation in Arbitrary Word Order

Yi Liao; Xin Jiang; Qun Liu

arXiv:2004.11579·cs.CL·April 27, 2020·1 cites

Probabilistically Masked Language Model Capable of Autoregressive Generation in Arbitrary Word Order

Yi Liao, Xin Jiang, Qun Liu

PDF

Open Access 3 Repos 2 Models

TL;DR

This paper introduces a probabilistic masking scheme for language models, enabling arbitrary word order generation and outperforming BERT on understanding tasks, thus bridging masked and autoregressive models.

Contribution

The paper proposes a novel probabilistic masking approach that makes masked language models capable of autoregressive, arbitrary order text generation, unifying two major language modeling paradigms.

Findings

01

u-PMLM supports high-quality text generation in arbitrary order

02

u-PMLM outperforms BERT on downstream NLU tasks

03

Proves equivalence of u-PMLM to autoregressive permutated models

Abstract

Masked language model and autoregressive language model are two types of language models. While pretrained masked language models such as BERT overwhelm the line of natural language understanding (NLU) tasks, autoregressive language models such as GPT are especially capable in natural language generation (NLG). In this paper, we propose a probabilistic masking scheme for the masked language model, which we call probabilistically masked language model (PMLM). We implement a specific PMLM with a uniform prior distribution on the masking ratio named u-PMLM. We prove that u-PMLM is equivalent to an autoregressive permutated language model. One main advantage of the model is that it supports text generation in arbitrary order with surprisingly good quality, which could potentially enable new applications over traditional unidirectional generation. Besides, the pretrained u-PMLM also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsLinear Layer · Probabilistically Masked Language Model · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Dense Connections · Weight Decay · WordPiece · Softmax · Dropout