Even GPT-5.2 Can't Count to Five: The Case for Zero-Error Horizons in Trustworthy LLMs

Ryoma Sato

arXiv:2601.15714·cs.LG·January 23, 2026

Even GPT-5.2 Can't Count to Five: The Case for Zero-Error Horizons in Trustworthy LLMs

Ryoma Sato

PDF

Open Access

TL;DR

This paper introduces Zero-Error Horizon (ZEH), a metric for assessing the maximum error-free problem-solving range of trustworthy LLMs, revealing surprising limitations even in advanced models like GPT-5.2 and providing insights into their capabilities and safety implications.

Contribution

The paper proposes ZEH as a novel evaluation metric for LLMs, demonstrates its application on GPT-5.2 and Qwen2.5, and discusses methods to reduce the computational cost of ZEH assessment.

Findings

01

GPT-5.2 cannot solve simple parity and parentheses problems.

02

ZEH correlates with accuracy but reveals different detailed behaviors.

03

Tree structures and online softmax can speed up ZEH computation by up to ten times.

Abstract

We propose Zero-Error Horizon (ZEH) for trustworthy LLMs, which represents the maximum range that a model can solve without any errors. While ZEH itself is simple, we demonstrate that evaluating the ZEH of state-of-the-art LLMs yields abundant insights. For example, by evaluating the ZEH of GPT-5.2, we found that GPT-5.2 cannot even compute the parity of a short string like 11000, and GPT-5.2 cannot determine whether the parentheses in ((((()))))) are balanced. This is surprising given the excellent capabilities of GPT-5.2. The fact that LLMs make mistakes on such simple problems serves as an important lesson when applying LLMs to safety-critical domains. By applying ZEH to Qwen2.5 and conducting detailed analysis, we found that while ZEH correlates with accuracy, the detailed behaviors differ, and ZEH provides clues about the emergence of algorithmic capabilities. Finally, while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSecurity and Verification in Computing · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)