When Shallow Wins: Silent Failures and the Depth-Accuracy Paradox in Latent Reasoning
Subramanyam Sahoo, Aman Chadha, Vinija Jain, Divya Chaudhary

TL;DR
This paper uncovers that current mathematical reasoning models often rely on unstable reasoning pathways, leading to silent failures and a paradox where larger models do not necessarily improve accuracy, highlighting the need for more reliable evaluation methods.
Contribution
The study introduces novel faithfulness metrics to analyze reasoning pathways, revealing the prevalence of unreliable reasoning and silent failures in state-of-the-art models, and challenges assumptions about model scaling benefits.
Findings
61% accuracy achieved with mixed reasoning pathways
8.8% of predictions are silent failures
Scaling from 1.5B to 7B parameters yields no accuracy improvement
Abstract
Mathematical reasoning models are widely deployed in education, automated tutoring, and decision support systems despite exhibiting fundamental computational instabilities. We demonstrate that state-of-the-art models (Qwen2.5-Math-7B) achieve 61% accuracy through a mixture of reliable and unreliable reasoning pathways: 18.4% of correct predictions employ stable, faithful reasoning while 81.6% emerge through computationally inconsistent pathways. Additionally, 8.8% of all predictions are silent failures -- confident yet incorrect outputs. Through comprehensive analysis using novel faithfulness metrics, we reveal: (1) reasoning quality shows weak negative correlation with correctness (r=-0.21, p=0.002), reflecting a binary classification threshold artifact rather than a monotonic inverse relationship; (2) scaling from 1.5B to 7B parameters (4.7x increase) provides zero accuracy benefit on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Explainable Artificial Intelligence (XAI) · Mathematics Education and Teaching Techniques
