MiLe Loss: a New Entropy-Weighed Loss for Mitigating the Bias of Learning Difficulties in Large Language Models

Zhenpeng Su; Xing Wu; Xue Bai; Zijia Lin; Hui Chen; Guiguang Ding; Wei Zhou; Songlin Hu

arXiv:2310.19531·cs.CL·January 16, 2026·1 cites

MiLe Loss: a New Entropy-Weighed Loss for Mitigating the Bias of Learning Difficulties in Large Language Models

Zhenpeng Su, Xing Wu, Xue Bai, Zijia Lin, Hui Chen, Guiguang Ding, Wei Zhou, Songlin Hu

PDF

Open Access 1 Repo

TL;DR

This paper introduces MiLe Loss, a novel entropy-based loss function that dynamically emphasizes difficult-to-learn tokens during training, improving large language models' performance on downstream tasks.

Contribution

The paper proposes a new entropy-weighted loss function that adaptively focuses on infrequent and challenging tokens, addressing bias in training large language models.

Findings

01

Models with MiLe Loss outperform baselines on downstream benchmarks.

02

MiLe Loss improves learning of infrequent tokens.

03

Performance gains are consistent across different model scales.

Abstract

Generative language models are usually pretrained on large text corpus via predicting the next token (i.e., sub-word/word/phrase) given the previous ones. Recent works have demonstrated the impressive performance of large generative language models on downstream tasks. However, existing generative language models generally neglect an inherent challenge in text corpus during training, i.e., the imbalance between frequent tokens and infrequent ones. It can lead a language model to be dominated by common and easy-to-learn tokens, thereby overlooking the infrequent and difficult-to-learn ones. To alleviate that, we propose a MiLe Loss function for mitigating the bias of learning difficulties with tokens. During training, it can dynamically assess the learning difficulty of a to-be-learned token, according to the information entropy of the corresponding predicted probability distribution…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

suu990901/LLaMA-InfoEntropy-Loss
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

MethodsFocus