FACT-E: Causality-Inspired Evaluation for Trustworthy Chain-of-Thought Reasoning

Yuxi Sun; Aoqi Zuo; Haotian Xie; Wei Gao; Mingming Gong; Jing Ma

arXiv:2604.10693·cs.AI·April 21, 2026

FACT-E: Causality-Inspired Evaluation for Trustworthy Chain-of-Thought Reasoning

Yuxi Sun, Aoqi Zuo, Haotian Xie, Wei Gao, Mingming Gong, Jing Ma

PDF

TL;DR

FACT-E introduces a causality-inspired evaluation framework that improves the reliability of assessing reasoning faithfulness in large language models by using controlled perturbations and combined metrics.

Contribution

It presents a novel causality-inspired method for evaluating and selecting trustworthy reasoning trajectories in LLMs, reducing bias and improving faithfulness assessment.

Findings

01

FACT-E enhances reasoning-trajectory selection in LLMs.

02

It provides more reliable detection of flawed reasoning under noisy conditions.

03

FACT-E improves the quality of in-context learning exemplars.

Abstract

Chain-of-Thought (CoT) prompting has improved LLM reasoning, but models often generate explanations that appear coherent while containing unfaithful intermediate steps. Existing self-evaluation approaches are prone to inherent biases: the model may confidently endorse coherence even when the step-to-step implication is not valid, leading to unreliable faithfulness evaluation. We propose FACT-E, a causality-inspired framework for evaluating CoT quality. FACT-E uses controlled perturbations as an instrumental signal to separate genuine step-to-step dependence from bias-driven artifacts, producing more reliable faithfulness estimates (\textit{intra-chain faithfulness}). To select trustworthy trajectories, FACT-E jointly considers \textit{intra-chain faithfulness} and \textit{CoT-to-answer consistency}, ensuring that selected chains are both faithful internally and supportive of the correct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.