Can Large Reasoning Models Improve Accuracy on Mathematical Tasks Using Flawed Thinking?
Saraswathy Amjith, Mihika Dusad, Neha Muramalla, Shweta Shah

TL;DR
Training large language models on intentionally flawed reasoning traces enhances their ability to detect and recover from errors in mathematical problem-solving without reducing overall accuracy.
Contribution
This paper introduces a novel training approach using flawed reasoning traces, improving model robustness to errors in mathematical reasoning tasks.
Findings
Models trained on flawed traces outperform standard RL on flawed problems.
Training on reasoning errors yields greater robustness than calculation errors.
Robustness is improved without sacrificing accuracy on clean problems.
Abstract
Chain-of-thought (CoT) prompting has become central to mathematical reasoning in large language models, yet models remain brittle to early errors: a single arithmetic slip or unjustified inference typically propagates uncorrected to an incorrect final answer. We investigate whether training on intentionally flawed reasoning traces can teach models to detect and recover from such errors without degrading standard problem-solving ability. Using competition-level problems from MATH-lighteval, we generate CoT prefixes containing exactly one controlled error, either a calculation error (sign flips, dropped terms) or a reasoning error (misapplied rules, unjustified logical steps), and fine-tune Qwen3-4B with GRPO using a binary final-answer reward. Our Mixed-CoT-RL model matches standard RL on clean problems (41% vs 41%) while substantially outperforming it on problems prefilled with flawed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCognitive and developmental aspects of mathematical skills · Child and Animal Learning Development · Topic Modeling
