Token Masking Improves Transformer-Based Text Classification

Xianglong Xu; John Bowen; Rojin Taheri

arXiv:2505.11746·cs.CL·May 20, 2025

Token Masking Improves Transformer-Based Text Classification

Xianglong Xu, John Bowen, Rojin Taheri

PDF

Open Access

TL;DR

This paper introduces token masking regularization for transformer models, which randomly masks input tokens during training to improve text classification performance by reducing overfitting and smoothing gradients.

Contribution

It proposes a simple, theoretically motivated token masking method that enhances transformer-based text classifiers across multiple models and tasks.

Findings

01

Consistent performance improvements across models and tasks.

02

Optimal masking rate identified at p=0.1.

03

Gains attributed to reduced overfitting and implicit ensembling.

Abstract

While transformer-based models achieve strong performance on text classification, we explore whether masking input tokens can further enhance their effectiveness. We propose token masking regularization, a simple yet theoretically motivated method that randomly replaces input tokens with a special [MASK] token at probability p. This introduces stochastic perturbations during training, leading to implicit gradient averaging that encourages the model to capture deeper inter-token dependencies. Experiments on language identification and sentiment analysis -- across diverse models (mBERT, Qwen2.5-0.5B, TinyLlama-1.1B) -- show consistent improvements over standard regularization techniques. We identify task-specific optimal masking rates, with p = 0.1 as a strong general default. We attribute the gains to two key effects: (1) input perturbation reduces overfitting, and (2) gradient-level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSentiment Analysis and Opinion Mining · Authorship Attribution and Profiling · Text and Document Classification Technologies