StreetMath: Study of LLMs' Approximation Behaviors

Chiung-Yi Tseng; Somshubhra Roy; Maisha Thasin; Danyang Zhang; and Blessing Effiong

arXiv:2510.25776·cs.CL·October 31, 2025

StreetMath: Study of LLMs' Approximation Behaviors

Chiung-Yi Tseng, Somshubhra Roy, Maisha Thasin, Danyang Zhang, and Blessing Effiong

PDF

TL;DR

This paper introduces StreetMath, a benchmark for evaluating large language models' ability to perform approximate reasoning in informal mathematical tasks, revealing their tendencies to seek exact solutions and use separate components for different operations.

Contribution

The work presents a new benchmark for approximation in LLMs, extensive evaluations across multiple models, and mechanistic insights into their internal computation behaviors.

Findings

01

LLMs often attempt exact calculations or external tool use in approximation tasks.

02

Models sometimes produce correct answers early in processing but consume more tokens.

03

Exact and approximate arithmetic rely on largely separate neural components.

Abstract

There is a substantial body of literature examining the mathematical reasoning capabilities of large language models (LLMs), particularly their performance on precise arithmetic operations in autoregressive architectures. However, their ability to perform approximate reasoning in informal, fast-paced mathematical operations has received far less attention, especially among non-autoregressive decoder models. Our work addresses this gap by introducing StreetMath, a benchmark designed to evaluate models' approximation abilities under real-world approximation scenarios. We conduct extensive evaluations across different LLM architectures: Qwen3-4B-Instruct-2507, Qwen3-4B-Thinking-2507, Dream-v0-Instruct-7B, Falcon-Mamba-7B-Instruct, and Mamba-GPT-3B. Furthermore, we apply mechanistic interpretability techniques to probe their internal computational states. Our analysis reveals that LLMs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.