ExLM: Rethinking the Impact of [MASK] Tokens in Masked Language Models

Kangjie Zheng; Junwei Yang; Siyue Liang; Bin Feng; Zequn Liu; Wei Ju; Zhiping Xiao; Ming Zhang

arXiv:2501.13397·cs.CL·June 10, 2025

ExLM: Rethinking the Impact of [MASK] Tokens in Masked Language Models

Kangjie Zheng, Junwei Yang, Siyue Liang, Bin Feng, Zequn Liu, Wei Ju, Zhiping Xiao, Ming Zhang

PDF

Open Access

TL;DR

This paper investigates the impact of [MASK] tokens in Masked Language Models, identifies the corrupted semantics problem, and introduces ExLM, an enhanced-context MLM that improves semantic understanding and task performance.

Contribution

The paper proposes ExLM, a novel MLM that expands [MASK] tokens to increase context capacity and mitigate corrupted semantics, advancing the effectiveness of masked language modeling.

Findings

01

ExLM outperforms baseline models in text and SMILES modeling tasks.

02

Expanded context modeling enriches semantic representations.

03

ExLM reduces semantic multimodality in MLMs.

Abstract

Masked Language Models (MLMs) have achieved remarkable success in many self-supervised representation learning tasks. MLMs are trained by randomly masking portions of the input sequences with [MASK] tokens and learning to reconstruct the original content based on the remaining context. This paper explores the impact of [MASK] tokens on MLMs. Analytical studies show that masking tokens can introduce the corrupted semantics problem, wherein the corrupted context may convey multiple, ambiguous meanings. This problem is also a key factor affecting the performance of MLMs on downstream tasks. Based on these findings, we propose a novel enhanced-context MLM, ExLM. Our approach expands [MASK] tokens in the input context and models the dependencies between these expanded states. This enhancement increases context capacity and enables the model to capture richer semantic information, effectively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques