Thinking Machines: Mathematical Reasoning in the Age of LLMs
Andrea Asperti, Alberto Naibo, Claudio Sacerdoti Coen

TL;DR
This paper reviews the capabilities and limitations of large language models in mathematical reasoning, highlighting challenges in formalized mathematics and exploring the nature of reasoning and state representation in these models.
Contribution
It provides a comprehensive analysis of recent models and benchmarks, discussing the differences between code and proof generation, and questions the internal reasoning processes of LLMs.
Findings
Proof synthesis is more brittle than code generation.
LLMs may emulate rather than genuinely represent logical states.
Trade-offs exist between traditional and formalized mathematics in training.
Abstract
Large Language Models (LLMs) have demonstrated impressive capabilities in structured reasoning and symbolic tasks, with coding emerging as a particularly successful application. This progress has naturally motivated efforts to extend these models to mathematics, both in its traditional form, expressed through natural-style mathematical language, and in its formalized counterpart, expressed in a symbolic syntax suitable for automatic verification. Yet, despite apparent parallels between programming and proof construction, advances in formalized mathematics have proven significantly more challenging. This gap raises fundamental questions about the nature of reasoning in current LLM architectures, the role of supervision and feedback, and the extent to which such models maintain an internal notion of computational or deductive state. In this article, we review the current state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
