Analysing the Effect of Masking Length Distribution of MLM: An Evaluation Framework and Case Study on Chinese MRC Datasets
Changchang. Zeng, Shaobo. Li

TL;DR
This paper investigates how the distribution of masked token lengths in MLM affects performance in Chinese MRC tasks, proposing new datasets and models to analyze this relationship.
Contribution
It introduces four new MRC tasks with varied answer lengths, creates corresponding datasets, and pre-trains MLMs to study the impact of masking length distribution on MRC performance.
Findings
Masking length distribution significantly influences MRC performance.
Pre-trained models with tailored masking strategies outperform generic models.
The hypothesis that answer length correlates with optimal mask length is validated.
Abstract
Machine reading comprehension (MRC) is a challenging natural language processing (NLP) task. Recently, the emergence of pre-trained models (PTM) has brought this research field into a new era, in which the training objective plays a key role. The masked language model (MLM) is a self-supervised training objective that widely used in various PTMs. With the development of training objectives, many variants of MLM have been proposed, such as whole word masking, entity masking, phrase masking, span masking, and so on. In different MLM, the length of the masked tokens is different. Similarly, in different machine reading comprehension tasks, the length of the answer is also different, and the answer is often a word, phrase, or sentence. Thus, in MRC tasks with different answer lengths, whether the length of MLM is related to performance is a question worth studying. If this hypothesis is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
