UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training
Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Xiaodong Liu, Yu, Wang, Songhao Piao, Jianfeng Gao, Ming Zhou, Hsiao-Wuen Hon

TL;DR
UniLMv2 introduces a novel pseudo-masked language modeling pre-training method that unifies autoencoding and autoregressive tasks, leading to state-of-the-art performance on various NLP benchmarks.
Contribution
The paper presents a new pre-training approach called PMLM that combines autoencoding and autoregressive training within a single unified model architecture.
Findings
Achieves new state-of-the-art results on multiple NLP benchmarks.
Effectively unifies bidirectional encoding and sequence-to-sequence decoding.
Reduces redundant computation through shared context encodings.
Abstract
We propose to pre-train a unified language model for both autoencoding and partially autoregressive language modeling tasks using a novel training procedure, referred to as a pseudo-masked language model (PMLM). Given an input text with masked tokens, we rely on conventional masks to learn inter-relations between corrupted tokens and context via autoencoding, and pseudo masks to learn intra-relations between masked spans via partially autoregressive modeling. With well-designed position embeddings and self-attention masks, the context encodings are reused to avoid redundant computation. Moreover, conventional masks used for autoencoding provide global masking information, so that all the position embeddings are accessible in partially autoregressive language modeling. In addition, the two tasks pre-train a unified language model as a bidirectional encoder and a sequence-to-sequence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
