Efficient Pre-training of Masked Language Model via Concept-based   Curriculum Masking

Mingyu Lee; Jun-Hyung Park; Junho Kim; Kang-Min Kim; and SangKeun Lee

arXiv:2212.07617·cs.CL·December 16, 2022

Efficient Pre-training of Masked Language Model via Concept-based Curriculum Masking

Mingyu Lee, Jun-Hyung Park, Junho Kim, Kang-Min Kim, and SangKeun Lee

PDF

Open Access 1 Repo

TL;DR

This paper introduces a concept-based curriculum masking method that enhances the efficiency of masked language model pre-training by gradually masking related words, achieving comparable performance to BERT with half the training cost.

Contribution

The paper presents a novel curriculum masking approach that leverages linguistic difficulty and knowledge graphs to improve MLM pre-training efficiency.

Findings

01

CCM significantly reduces training costs.

02

Model trained with CCM matches BERT's performance on GLUE.

03

Efficient pre-training with improved resource utilization.

Abstract

Masked language modeling (MLM) has been widely used for pre-training effective bidirectional representations, but incurs substantial training costs. In this paper, we propose a novel concept-based curriculum masking (CCM) method to efficiently pre-train a language model. CCM has two key differences from existing curriculum learning approaches to effectively reflect the nature of MLM. First, we introduce a carefully-designed linguistic difficulty criterion that evaluates the MLM difficulty of each token. Second, we construct a curriculum that gradually masks words related to the previously masked words by retrieving a knowledge graph. Experimental results show that CCM significantly improves pre-training efficiency. Specifically, the model trained with CCM shows comparative performance with the original BERT on the General Language Understanding Evaluation benchmark at half of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

koreamglee/concept-based-curriculum-masking
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Residual Connection · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Softmax · Adam · WordPiece · Layer Normalization