AI-rithmetic
Alex Bie, Travis Dick, Alex Kulesza, Prabhakar Raghavan, Vinod Raman, Sergei Vassilvitskii

TL;DR
Despite advancements in AI for complex tasks, current models struggle with basic arithmetic, especially as numbers grow larger, mainly due to interpretability issues like misalignment and carrying errors.
Contribution
This paper systematically investigates the failure modes of AI models in simple addition, revealing key error types and their relation to tokenization and randomness.
Findings
Models' accuracy decreases with larger numbers in addition.
Most errors are due to operand misalignment or carrying failures.
Error types are often linked to tokenization and appear as independent failures.
Abstract
Modern AI systems have been successfully deployed to win medals at international math competitions, assist with research workflows, and prove novel technical lemmas. However, despite their progress at advanced levels of mathematics, they remain stubbornly bad at basic arithmetic, consistently failing on the simple task of adding two numbers. We present a systematic investigation of this phenomenon. We demonstrate empirically that all frontier models suffer significantly degraded accuracy for integer addition as the number of digits increases. Furthermore, we show that most errors made by these models are highly interpretable and can be attributed to either operand misalignment or a failure to correctly carry; these two error classes explain 87.9%, 62.9%, and 92.4% of Claude Opus 4.1, GPT-5, and Gemini 2.5 Pro errors, respectively. Finally, we show that misalignment errors are frequently…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBenford’s Law and Fraud Detection · Computability, Logic, AI Algorithms · Numerical Methods and Algorithms
