Frustratingly Simple Pretraining Alternatives to Masked Language Modeling
Atsuki Yamaguchi, George Chrysostomou, Katerina Margatina, Nikolaos, Aletras

TL;DR
This paper investigates five simple token-level pretraining objectives as alternatives to masked language modeling, demonstrating they can achieve comparable or better performance on NLP benchmarks with fewer parameters.
Contribution
The authors introduce and empirically evaluate new simple pretraining objectives that outperform MLM in some settings, reducing complexity and computational requirements.
Findings
Proposed objectives achieve similar or better results than MLM on GLUE and SQuAD.
Smaller models pretrained with these objectives retain high performance, with only 1% score drop.
Simpler pretraining objectives can replace MLM without sacrificing accuracy.
Abstract
Masked language modeling (MLM), a self-supervised pretraining objective, is widely used in natural language processing for learning text representations. MLM trains a model to predict a random sample of input tokens that have been replaced by a [MASK] placeholder in a multi-class setting over the entire vocabulary. When pretraining, it is common to use alongside MLM other auxiliary objectives on the token or sequence level to improve downstream performance (e.g. next sentence prediction). However, no previous work so far has attempted in examining whether other simpler linguistically intuitive or not objectives can be used standalone as main pretraining objectives. In this paper, we explore five simple pretraining objectives based on token-level classification tasks as replacements of MLM. Empirical results on GLUE and SQuAD show that our proposed methods achieve comparable or better…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
