StreetMath: Study of LLMs' Approximation Behaviors
Chiung-Yi Tseng, Somshubhra Roy, Maisha Thasin, Danyang Zhang, and Blessing Effiong

TL;DR
This paper introduces StreetMath, a benchmark for evaluating large language models' ability to perform approximate reasoning in informal mathematical tasks, revealing their tendencies to seek exact solutions and use separate components for different operations.
Contribution
The work presents a new benchmark for approximation in LLMs, extensive evaluations across multiple models, and mechanistic insights into their internal computation behaviors.
Findings
LLMs often attempt exact calculations or external tool use in approximation tasks.
Models sometimes produce correct answers early in processing but consume more tokens.
Exact and approximate arithmetic rely on largely separate neural components.
Abstract
There is a substantial body of literature examining the mathematical reasoning capabilities of large language models (LLMs), particularly their performance on precise arithmetic operations in autoregressive architectures. However, their ability to perform approximate reasoning in informal, fast-paced mathematical operations has received far less attention, especially among non-autoregressive decoder models. Our work addresses this gap by introducing StreetMath, a benchmark designed to evaluate models' approximation abilities under real-world approximation scenarios. We conduct extensive evaluations across different LLM architectures: Qwen3-4B-Instruct-2507, Qwen3-4B-Thinking-2507, Dream-v0-Instruct-7B, Falcon-Mamba-7B-Instruct, and Mamba-GPT-3B. Furthermore, we apply mechanistic interpretability techniques to probe their internal computational states. Our analysis reveals that LLMs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
