Mask and You Shall Receive: Optimizing Masked Language Modeling For Pretraining BabyLMs
Lukas Edman, Alexander Fraser

TL;DR
This paper introduces an improved Masked Language Modeling technique that adjusts masking probabilities based on prediction difficulty, enhancing performance on language understanding tasks for BabyLMs.
Contribution
It proposes a novel MLM approach that dynamically adapts masking probabilities and incorporates sub-token embeddings, leading to better generalization and task performance.
Findings
Significant performance boost on (Super)GLUE tasks.
Enhanced morphological generalization with sub-token embeddings.
Outperforms baseline in the strict-small BabyLM track.
Abstract
We describe our strategy for the 2025 edition of the BabyLM Challenge. Our main contribution is that of an improved form of Masked Language Modeling (MLM), which adapts the probabilities of the tokens masked according to the model's ability to predict them. The results show a substantial increase in performance on (Super)GLUE tasks over the standard MLM. We also incorporate sub-token embeddings, finding that this increases the model's morphological generalization capabilities. Our submission beats the baseline in the strict-small track.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Generative Adversarial Networks and Image Synthesis
