Robust Answers, Fragile Logic: Probing the Decoupling Hypothesis in LLM Reasoning
Enyi Jiang, Changming Xu, Nischay Singh, Tian Qiu, Gagandeep Singh

TL;DR
This paper investigates whether Large Language Models' reasoning processes are genuinely reliable or just post-hoc rationalizations, revealing vulnerabilities where models maintain correct answers despite inconsistent reasoning under input perturbations.
Contribution
The paper introduces MATCHA, a novel framework for probing LLM reasoning by isolating the reasoning phase conditioned on answers, exposing the fragility of current CoT prompting methods.
Findings
LLMs often produce inconsistent reasoning when answers are correct.
Multi-step and commonsense tasks are more vulnerable to decoupling.
Adversarial examples transfer to black-box models, exposing robustness gaps.
Abstract
While Chain-of-Thought (CoT) prompting has become a cornerstone for complex reasoning in Large Language Models (LLMs), the faithfulness of the generated reasoning remains an open question. We investigate the Decoupling Hypothesis: that correct answers often mask fragile, post-hoc rationalizations that are not causally tied to the model's prediction. To systematically verify this, we introduce MATCHA, a novel Answer-Conditioned Probing framework. Unlike standard evaluations that focus on final output accuracy, MATCHA isolates the reasoning phase by conditioning generation on the model's predicted answer, allowing us to stress-test the stability of the rationale itself. Our experiments reveal a critical vulnerability: under imperceptible input perturbations, LLMs frequently maintain the correct answer while generating inconsistent or nonsensical reasoning - effectively being ``Right for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
