Mask and You Shall Receive: Optimizing Masked Language Modeling For Pretraining BabyLMs

Lukas Edman; Alexander Fraser

arXiv:2510.20475·cs.CL·October 24, 2025

Mask and You Shall Receive: Optimizing Masked Language Modeling For Pretraining BabyLMs

Lukas Edman, Alexander Fraser

PDF

Open Access 1 Video

TL;DR

This paper introduces an improved Masked Language Modeling technique that adjusts masking probabilities based on prediction difficulty, enhancing performance on language understanding tasks for BabyLMs.

Contribution

It proposes a novel MLM approach that dynamically adapts masking probabilities and incorporates sub-token embeddings, leading to better generalization and task performance.

Findings

01

Significant performance boost on (Super)GLUE tasks.

02

Enhanced morphological generalization with sub-token embeddings.

03

Outperforms baseline in the strict-small BabyLM track.

Abstract

We describe our strategy for the 2025 edition of the BabyLM Challenge. Our main contribution is that of an improved form of Masked Language Modeling (MLM), which adapts the probabilities of the tokens masked according to the model's ability to predict them. The results show a substantial increase in performance on (Super)GLUE tasks over the standard MLM. We also incorporate sub-token embeddings, finding that this increases the model's morphological generalization capabilities. Our submission beats the baseline in the strict-small track.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Mask and You Shall Receive: Optimizing Masked Language Modeling For Pretraining BabyLMs· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Generative Adversarial Networks and Image Synthesis