Even GPT-5.2 Can't Count to Five: The Case for Zero-Error Horizons in Trustworthy LLMs
Ryoma Sato

TL;DR
This paper introduces Zero-Error Horizon (ZEH), a metric for assessing the maximum error-free problem-solving range of trustworthy LLMs, revealing surprising limitations even in advanced models like GPT-5.2 and providing insights into their capabilities and safety implications.
Contribution
The paper proposes ZEH as a novel evaluation metric for LLMs, demonstrates its application on GPT-5.2 and Qwen2.5, and discusses methods to reduce the computational cost of ZEH assessment.
Findings
GPT-5.2 cannot solve simple parity and parentheses problems.
ZEH correlates with accuracy but reveals different detailed behaviors.
Tree structures and online softmax can speed up ZEH computation by up to ten times.
Abstract
We propose Zero-Error Horizon (ZEH) for trustworthy LLMs, which represents the maximum range that a model can solve without any errors. While ZEH itself is simple, we demonstrate that evaluating the ZEH of state-of-the-art LLMs yields abundant insights. For example, by evaluating the ZEH of GPT-5.2, we found that GPT-5.2 cannot even compute the parity of a short string like 11000, and GPT-5.2 cannot determine whether the parentheses in ((((()))))) are balanced. This is surprising given the excellent capabilities of GPT-5.2. The fact that LLMs make mistakes on such simple problems serves as an important lesson when applying LLMs to safety-critical domains. By applying ZEH to Qwen2.5 and conducting detailed analysis, we found that while ZEH correlates with accuracy, the detailed behaviors differ, and ZEH provides clues about the emergence of algorithmic capabilities. Finally, while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSecurity and Verification in Computing · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
