SenseMath: Do LLMs Have Number Sense? Evaluating Shortcut Use, Judgment, and Generation

Haomin Zhuang; Xiangqi Wang; Yili Shen; Ying Cheng; Xiangliang Zhang

arXiv:2604.01988·cs.AI·April 3, 2026

SenseMath: Do LLMs Have Number Sense? Evaluating Shortcut Use, Judgment, and Generation

Haomin Zhuang, Xiangqi Wang, Yili Shen, Ying Cheng, Xiangliang Zhang

PDF

TL;DR

SenseMath introduces a benchmark to evaluate whether large language models understand numerical structure and apply shortcuts appropriately, revealing they often overgeneralize and lack true number sense.

Contribution

This work provides a controlled benchmark with diverse tasks to assess LLMs' numerical reasoning, highlighting their limitations in structural understanding and context-aware shortcut use.

Findings

01

Models adopt shortcuts when prompted, improving accuracy by up to 15%.

02

Under standard prompting, models use shortcuts in fewer than 40% of cases.

03

Models overgeneralize shortcuts and cannot generate valid shortcut problems from scratch.

Abstract

Large language models often default to step-by-step computation even when efficient numerical shortcuts are available. This raises a basic question: do they exhibit number sense in a human-like behavioral sense, i.e., the ability to recognize numerical structure, apply shortcuts when appropriate, and avoid them when they are not? We introduce SenseMath, a controlled benchmark for evaluating structure-sensitive numerical reasoning in LLMs. SenseMath contains 4,800 items spanning eight shortcut categories and four digit scales, with matched strong-shortcut, weak-shortcut, and control variants. It supports three evaluation settings of increasing cognitive demand: Shortcut Use (whether models can apply shortcuts on shortcut-amenable problems); Applicability Judgment (whether they can recognize when a shortcut is appropriate or misleading); and Problem Generation (whether they can generate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.