Breaking the Chain: A Causal Analysis of LLM Faithfulness to Intermediate Structures

Oleg Somov; Mikhail Chaichuk; Mikhail Seleznyov; Alexander Panchenko; Elena Tutubalina

arXiv:2603.16475·cs.AI·March 18, 2026

Breaking the Chain: A Causal Analysis of LLM Faithfulness to Intermediate Structures

Oleg Somov, Mikhail Chaichuk, Mikhail Seleznyov, Alexander Panchenko, Elena Tutubalina

PDF

Open Access 3 Reviews

TL;DR

This paper investigates whether intermediate structures in schema-guided reasoning pipelines causally influence LLM outputs, revealing that such structures often act as influential context rather than stable causal factors, with implications for model interpretability.

Contribution

Introduces a causal evaluation protocol to test the causal influence of intermediate structures on LLM outputs, demonstrating their fragility and limited causal role.

Findings

01

Models are self-consistent with their structures but often do not update after interventions.

02

Fragility of faithfulness increases when external tools derive final decisions.

03

Intermediate structures act more as influential context than causal mediators.

Abstract

Schema-guided reasoning pipelines ask LLMs to produce explicit intermediate structures -- rubrics, checklists, verification queries -- before committing to a final decision. But do these structures causally determine the output, or merely accompany it? We introduce a causal evaluation protocol that makes this directly measurable: by selecting tasks where a deterministic function maps intermediate structures to decisions, every controlled edit implies a unique correct output. Across eight models and three benchmarks, models appear self-consistent with their own intermediate structures but fail to update predictions after intervention in up to 60% of cases -- revealing that apparent faithfulness is fragile once the intermediate structure changes. When derivation of the final decision from the structure is delegated to an external tool, this fragility largely disappears; however, prompts…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 3

Strengths

This paper aimed to explore the faithfulness of LLM-generated content to intermediate reasoning structures, which held practical significance for understanding LLM causal reasoning capabilities.

Weaknesses

$\bullet$ The main limitation lies in the lack of a clear research objective and insufficient consideration of the complexity of natural language. (1) For autoregressive language models, what exactly does the intermediate reasoning representation $M^* \neq M$ refer to? What level of discrepancy is being considered, token-level or semantic-level? If token-level, $M^* \neq M$ does not necessarily imply the final decision $Y^* \neq Y$, so Eq. (2) may not hold. If semantic-level, how is semantic i

Reviewer 02Rating 6Confidence 3

Strengths

1. The paper is well-written and easy to follow. 2. The finding were clearly stated and demonstrated, with broader implications for understanding reasoning models. 3. The evaluation is comprehensive, testing 9 LLMs across 4 diverse benchmarks.

Weaknesses

1. The experiment result should show that it is statistically significant (e.g., use CI, paired t-test) not the point-estimate averages in Figure 3 and 4. 2. The metric appear to be binary, which could be a limitation for quantitative task. 3. Faithfulness metrics partly conflate invariance with mediation, which I think the author should specify the definition of 'faithfulness'. 4. The paper should evaluation additional prompting beyond few-shot, as improved edit sensitivity under stronger promp

Reviewer 03Rating 2Confidence 3

Strengths

The paper is well-written and clear, and addresses an interesting problem of the logical consistency of LRMs and their faithfulness to their own CoT explanations. The focus on structured CoT makes evaluations more clear since the ground truth answer is usually clear.

Weaknesses

My main issue is the following: If the generated structure (e.g., rubric) is edited but the prompt (e.g., the student's answer) is not edited, doesn't this generate a contradiction between the prompt and the CoT? If this is the case, could the authors clarify what the expected output of the model should be? If the edits are introducing a logical contradiction then, to me, it isn't clear that the model should be expected to answer according to the content of the rubric (this would require it to

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Semantic Web and Ontologies · Logic, Reasoning, and Knowledge