Number Cookbook: Number Understanding of Language Models and How to Improve It
Haotong Yang, Yi Hu, Shijia Kang, Zhouchen Lin, Muhan Zhang

TL;DR
This paper introduces a comprehensive benchmark to evaluate numerical understanding in large language models, revealing their frequent failures and exploring methods to improve their basic numerical processing abilities.
Contribution
It provides the first extensive benchmark covering various numerical tasks, evaluates existing techniques, and analyzes the effectiveness of finetuning and chain-of-thought methods for NUPA in LLMs.
Findings
Current LLMs often fail in basic numerical tasks.
Naive finetuning improves NUPA on some tasks but not all.
Techniques designed to enhance NUPA are ineffective when finetuning pretrained models.
Abstract
Large language models (LLMs) can solve an increasing number of complex reasoning tasks while making surprising mistakes in basic numerical understanding and processing (such as 9.11 > 9.9). The latter ability is essential for tackling complex arithmetic and mathematical problems and serves as a foundation for most reasoning tasks, but previous work paid little attention to it or only discussed several restricted tasks (like integer addition). In this paper, we comprehensively investigate the numerical understanding and processing ability (NUPA) of LLMs. Firstly, we introduce a benchmark covering four common numerical representations and 17 distinct numerical tasks in four major categories, resulting in 41 meaningful combinations in total. These tasks are derived from primary and secondary education curricula, encompassing nearly all everyday numerical understanding and processing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsCognitive and developmental aspects of mathematical skills · Reading and Literacy Development · Education and Technology Integration
MethodsSoftmax · Attention Is All You Need
