MPNet: Masked and Permuted Pre-training for Language Understanding
Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu

TL;DR
MPNet introduces a pre-training approach combining the strengths of BERT and XLNet, leveraging token dependencies and position information to significantly improve language understanding tasks.
Contribution
It proposes MPNet, a novel pre-training method that integrates permuted language modeling with auxiliary position information to overcome limitations of prior models.
Findings
MPNet outperforms BERT and XLNet on multiple NLP benchmarks.
MPNet achieves state-of-the-art results on GLUE and SQuAD.
Pre-trained MPNet models show significant improvements in downstream tasks.
Abstract
BERT adopts masked language modeling (MLM) for pre-training and is one of the most successful pre-training models. Since BERT neglects dependency among predicted tokens, XLNet introduces permuted language modeling (PLM) for pre-training to address this problem. However, XLNet does not leverage the full position information of a sentence and thus suffers from position discrepancy between pre-training and fine-tuning. In this paper, we propose MPNet, a novel pre-training method that inherits the advantages of BERT and XLNet and avoids their limitations. MPNet leverages the dependency among predicted tokens through permuted language modeling (vs. MLM in BERT), and takes auxiliary position information as input to make the model see a full sentence and thus reducing the position discrepancy (vs. PLM in XLNet). We pre-train MPNet on a large-scale dataset (over 160GB text corpora) and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsLinear Layer · MPNet · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Byte Pair Encoding · Weight Decay · SentencePiece · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections
