Efficient Pre-training of Masked Language Model via Concept-based Curriculum Masking
Mingyu Lee, Jun-Hyung Park, Junho Kim, Kang-Min Kim, and SangKeun Lee

TL;DR
This paper introduces a concept-based curriculum masking method that enhances the efficiency of masked language model pre-training by gradually masking related words, achieving comparable performance to BERT with half the training cost.
Contribution
The paper presents a novel curriculum masking approach that leverages linguistic difficulty and knowledge graphs to improve MLM pre-training efficiency.
Findings
CCM significantly reduces training costs.
Model trained with CCM matches BERT's performance on GLUE.
Efficient pre-training with improved resource utilization.
Abstract
Masked language modeling (MLM) has been widely used for pre-training effective bidirectional representations, but incurs substantial training costs. In this paper, we propose a novel concept-based curriculum masking (CCM) method to efficiently pre-train a language model. CCM has two key differences from existing curriculum learning approaches to effectively reflect the nature of MLM. First, we introduce a carefully-designed linguistic difficulty criterion that evaluates the MLM difficulty of each token. Second, we construct a curriculum that gradually masks words related to the previously masked words by retrieving a knowledge graph. Experimental results show that CCM significantly improves pre-training efficiency. Specifically, the model trained with CCM shows comparative performance with the original BERT on the General Language Understanding Evaluation benchmark at half of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Residual Connection · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Softmax · Adam · WordPiece · Layer Normalization
