MPNet: Masked and Permuted Pre-training for Language Understanding

Kaitao Song; Xu Tan; Tao Qin; Jianfeng Lu; Tie-Yan Liu

arXiv:2004.09297·cs.CL·November 3, 2020·505 cites

MPNet: Masked and Permuted Pre-training for Language Understanding

Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu

PDF

Open Access 5 Repos 1 Video

TL;DR

MPNet introduces a pre-training approach combining the strengths of BERT and XLNet, leveraging token dependencies and position information to significantly improve language understanding tasks.

Contribution

It proposes MPNet, a novel pre-training method that integrates permuted language modeling with auxiliary position information to overcome limitations of prior models.

Findings

01

MPNet outperforms BERT and XLNet on multiple NLP benchmarks.

02

MPNet achieves state-of-the-art results on GLUE and SQuAD.

03

Pre-trained MPNet models show significant improvements in downstream tasks.

Abstract

BERT adopts masked language modeling (MLM) for pre-training and is one of the most successful pre-training models. Since BERT neglects dependency among predicted tokens, XLNet introduces permuted language modeling (PLM) for pre-training to address this problem. However, XLNet does not leverage the full position information of a sentence and thus suffers from position discrepancy between pre-training and fine-tuning. In this paper, we propose MPNet, a novel pre-training method that inherits the advantages of BERT and XLNet and avoids their limitations. MPNet leverages the dependency among predicted tokens through permuted language modeling (vs. MLM in BERT), and takes auxiliary position information as input to make the model see a full sentence and thus reducing the position discrepancy (vs. PLM in XLNet). We pre-train MPNet on a large-scale dataset (over 160GB text corpora) and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

MPNet: Masked and Permuted Pre-training for Language Understanding· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsLinear Layer · MPNet · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Byte Pair Encoding · Weight Decay · SentencePiece · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections