TL;DR
This paper introduces Digit Entropy Loss (DEL), a novel training objective for large language models that improves numerical prediction accuracy across various benchmarks by reformulating entropy optimization for floating-point numbers.
Contribution
DEL reformulates entropy optimization for numerical learning, enabling more accurate floating-point number prediction and outperforming existing methods on multiple benchmarks.
Findings
DEL outperforms existing numerical learning methods in accuracy.
DEL effectively handles floating-point number prediction.
Experiments show consistent improvements across seven benchmarks.
Abstract
Number prediction stands as a fundamental capability of large language models (LLMs) in mathematical problem-solving and code generation. The widely adopted maximum likelihood estimation (MLE) for LLM training is not tailored to number prediction. Recently, penalty-driven approaches, e.g., Number Token Loss and Discretized Distance Loss, introduce an inductive bias of numerical distance but induce over-sharpened and over-flattened digit distributions, respectively. In this paper, we make an in-depth analysis on LLM numerical learning, and show that existing numerical learning methods conceptually follow a criterion-distance formulation, where the criterion term represents optimization pattern and the distance term instills geometric prior. Consequently, we present Digit Entropy Loss (DEL) for auto-regressive numerical learning, which reformulates the conventional unsupervised entropy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
