Robust Answers, Fragile Logic: Probing the Decoupling Hypothesis in LLM Reasoning

Enyi Jiang; Changming Xu; Nischay Singh; Tian Qiu; Gagandeep Singh

arXiv:2505.17406·cs.AI·February 6, 2026

Robust Answers, Fragile Logic: Probing the Decoupling Hypothesis in LLM Reasoning

Enyi Jiang, Changming Xu, Nischay Singh, Tian Qiu, Gagandeep Singh

PDF

TL;DR

This paper investigates whether Large Language Models' reasoning processes are genuinely reliable or just post-hoc rationalizations, revealing vulnerabilities where models maintain correct answers despite inconsistent reasoning under input perturbations.

Contribution

The paper introduces MATCHA, a novel framework for probing LLM reasoning by isolating the reasoning phase conditioned on answers, exposing the fragility of current CoT prompting methods.

Findings

01

LLMs often produce inconsistent reasoning when answers are correct.

02

Multi-step and commonsense tasks are more vulnerable to decoupling.

03

Adversarial examples transfer to black-box models, exposing robustness gaps.

Abstract

While Chain-of-Thought (CoT) prompting has become a cornerstone for complex reasoning in Large Language Models (LLMs), the faithfulness of the generated reasoning remains an open question. We investigate the Decoupling Hypothesis: that correct answers often mask fragile, post-hoc rationalizations that are not causally tied to the model's prediction. To systematically verify this, we introduce MATCHA, a novel Answer-Conditioned Probing framework. Unlike standard evaluations that focus on final output accuracy, MATCHA isolates the reasoning phase by conditioning generation on the model's predicted answer, allowing us to stress-test the stability of the rationale itself. Our experiments reveal a critical vulnerability: under imperceptible input perturbations, LLMs frequently maintain the correct answer while generating inconsistent or nonsensical reasoning - effectively being ``Right for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.