PERT: Pre-training BERT with Permuted Language Model
Yiming Cui, Ziqing Yang, Ting Liu

TL;DR
PERT introduces a novel pre-training approach for BERT by permuting input tokens and predicting their original positions, enhancing natural language understanding across multiple languages and tasks.
Contribution
The paper proposes PERT, a new pre-training method using permuted language modeling, which diversifies training tasks beyond traditional masked language models.
Findings
PERT improves performance on several NLU benchmarks.
Permuted Language Model offers a new training paradigm.
Diverse pre-training tasks can enhance PLM capabilities.
Abstract
Pre-trained Language Models (PLMs) have been widely used in various natural language processing (NLP) tasks, owing to their powerful text representations trained on large-scale corpora. In this paper, we propose a new PLM called PERT for natural language understanding (NLU). PERT is an auto-encoding model (like BERT) trained with Permuted Language Model (PerLM). The formulation of the proposed PerLM is straightforward. We permute a proportion of the input text, and the training objective is to predict the position of the original token. Moreover, we also apply whole word masking and N-gram masking to improve the performance of PERT. We carried out extensive experiments on both Chinese and English NLU benchmarks. The experimental results show that PERT can bring improvements over various comparable baselines on some of the tasks, while others are not. These results indicate that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
