DEL: Digit Entropy Loss for Numerical Learning of Large Language Models

Zhaohui Zheng; Chenhang He; Shihao Wang; Yuxuan Li; Ming-Ming Cheng; Lei Zhang

arXiv:2605.20369·cs.CL·May 21, 2026

DEL: Digit Entropy Loss for Numerical Learning of Large Language Models

Zhaohui Zheng, Chenhang He, Shihao Wang, Yuxuan Li, Ming-Ming Cheng, Lei Zhang

PDF

1 Repo

TL;DR

This paper introduces Digit Entropy Loss (DEL), a novel training objective for large language models that improves numerical prediction accuracy across various benchmarks by reformulating entropy optimization for floating-point numbers.

Contribution

DEL reformulates entropy optimization for numerical learning, enabling more accurate floating-point number prediction and outperforming existing methods on multiple benchmarks.

Findings

01

DEL outperforms existing numerical learning methods in accuracy.

02

DEL effectively handles floating-point number prediction.

03

Experiments show consistent improvements across seven benchmarks.

Abstract

Number prediction stands as a fundamental capability of large language models (LLMs) in mathematical problem-solving and code generation. The widely adopted maximum likelihood estimation (MLE) for LLM training is not tailored to number prediction. Recently, penalty-driven approaches, e.g., Number Token Loss and Discretized Distance Loss, introduce an inductive bias of numerical distance but induce over-sharpened and over-flattened digit distributions, respectively. In this paper, we make an in-depth analysis on LLM numerical learning, and show that existing numerical learning methods conceptually follow a criterion-distance formulation, where the criterion term represents optimization pattern and the distance term instills geometric prior. Consequently, we present Digit Entropy Loss (DEL) for auto-regressive numerical learning, which reformulates the conventional unsupervised entropy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

PolyU-VCLab/DEL
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.