Entropy-Guided Token Dropout: Training Autoregressive Language Models with Limited Domain Data

Jiapeng Wang; Yiwen Hu; Yanzipeng Gao; Haoyu Wang; Shuo Wang; Hongyu Lu; Jiaxin Mao; Wayne Xin Zhao; Junyi Li; Xiao Zhang

arXiv:2512.23422·cs.CL·December 30, 2025

Entropy-Guided Token Dropout: Training Autoregressive Language Models with Limited Domain Data

Jiapeng Wang, Yiwen Hu, Yanzipeng Gao, Haoyu Wang, Shuo Wang, Hongyu Lu, Jiaxin Mao, Wayne Xin Zhao, Junyi Li, Xiao Zhang

PDF

Open Access

TL;DR

This paper introduces EntroDrop, a novel entropy-guided token dropout method that improves the training of autoregressive language models on limited domain data by addressing overfitting and balancing token learning dynamics.

Contribution

The paper proposes EntroDrop, a structured regularization technique that selectively masks low-entropy tokens and uses a curriculum schedule to enhance multi-epoch training of large language models.

Findings

01

EntroDrop outperforms standard regularization methods across various model sizes.

02

It maintains robust performance during extended multi-epoch training.

03

The approach effectively mitigates overfitting on limited domain data.

Abstract

As access to high-quality, domain-specific data grows increasingly scarce, multi-epoch training has become a practical strategy for adapting large language models (LLMs). However, autoregressive models often suffer from performance degradation under repeated data exposure, where overfitting leads to a marked decline in model capability. Through empirical analysis, we trace this degradation to an imbalance in learning dynamics: predictable, low-entropy tokens are learned quickly and come to dominate optimization, while the model's ability to generalize on high-entropy tokens deteriorates with continued training. To address this, we introduce EntroDrop, an entropy-guided token dropout method that functions as structured data regularization. EntroDrop selectively masks low-entropy tokens during training and employs a curriculum schedule to adjust regularization strength in alignment with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications