Entropy-Guided Token Dropout: Training Autoregressive Language Models with Limited Domain Data
Jiapeng Wang, Yiwen Hu, Yanzipeng Gao, Haoyu Wang, Shuo Wang, Hongyu Lu, Jiaxin Mao, Wayne Xin Zhao, Junyi Li, Xiao Zhang

TL;DR
This paper introduces EntroDrop, a novel entropy-guided token dropout method that improves the training of autoregressive language models on limited domain data by addressing overfitting and balancing token learning dynamics.
Contribution
The paper proposes EntroDrop, a structured regularization technique that selectively masks low-entropy tokens and uses a curriculum schedule to enhance multi-epoch training of large language models.
Findings
EntroDrop outperforms standard regularization methods across various model sizes.
It maintains robust performance during extended multi-epoch training.
The approach effectively mitigates overfitting on limited domain data.
Abstract
As access to high-quality, domain-specific data grows increasingly scarce, multi-epoch training has become a practical strategy for adapting large language models (LLMs). However, autoregressive models often suffer from performance degradation under repeated data exposure, where overfitting leads to a marked decline in model capability. Through empirical analysis, we trace this degradation to an imbalance in learning dynamics: predictable, low-entropy tokens are learned quickly and come to dominate optimization, while the model's ability to generalize on high-entropy tokens deteriorates with continued training. To address this, we introduce EntroDrop, an entropy-guided token dropout method that functions as structured data regularization. EntroDrop selectively masks low-entropy tokens during training and employs a curriculum schedule to adjust regularization strength in alignment with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
