When Shallow Wins: Silent Failures and the Depth-Accuracy Paradox in Latent Reasoning

Subramanyam Sahoo; Aman Chadha; Vinija Jain; Divya Chaudhary

arXiv:2603.03475·cs.LG·March 5, 2026

When Shallow Wins: Silent Failures and the Depth-Accuracy Paradox in Latent Reasoning

Subramanyam Sahoo, Aman Chadha, Vinija Jain, Divya Chaudhary

PDF

Open Access

TL;DR

This paper uncovers that current mathematical reasoning models often rely on unstable reasoning pathways, leading to silent failures and a paradox where larger models do not necessarily improve accuracy, highlighting the need for more reliable evaluation methods.

Contribution

The study introduces novel faithfulness metrics to analyze reasoning pathways, revealing the prevalence of unreliable reasoning and silent failures in state-of-the-art models, and challenges assumptions about model scaling benefits.

Findings

01

61% accuracy achieved with mixed reasoning pathways

02

8.8% of predictions are silent failures

03

Scaling from 1.5B to 7B parameters yields no accuracy improvement

Abstract

Mathematical reasoning models are widely deployed in education, automated tutoring, and decision support systems despite exhibiting fundamental computational instabilities. We demonstrate that state-of-the-art models (Qwen2.5-Math-7B) achieve 61% accuracy through a mixture of reliable and unreliable reasoning pathways: 18.4% of correct predictions employ stable, faithful reasoning while 81.6% emerge through computationally inconsistent pathways. Additionally, 8.8% of all predictions are silent failures -- confident yet incorrect outputs. Through comprehensive analysis using novel faithfulness metrics, we reveal: (1) reasoning quality shows weak negative correlation with correctness (r=-0.21, p=0.002), reflecting a binary classification threshold artifact rather than a monotonic inverse relationship; (2) scaling from 1.5B to 7B parameters (4.7x increase) provides zero accuracy benefit on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Explainable Artificial Intelligence (XAI) · Mathematics Education and Teaching Techniques