Measuring the Faithfulness of Thinking Drafts in Large Reasoning Models

Zidi Xiong; Shan Chen; Zhenting Qi; Himabindu Lakkaraju

arXiv:2505.13774·cs.AI·May 30, 2025

Measuring the Faithfulness of Thinking Drafts in Large Reasoning Models

Zidi Xiong, Shan Chen, Zhenting Qi, Himabindu Lakkaraju

PDF

Open Access 1 Datasets

TL;DR

This paper introduces a counterfactual intervention framework to evaluate the faithfulness of intermediate reasoning drafts in large reasoning models, revealing current models often lack faithful alignment with their reasoning processes.

Contribution

It proposes a systematic method to measure and analyze the faithfulness of thinking drafts in large reasoning models, addressing a key challenge in interpretability and reliability.

Findings

01

LRMs show selective faithfulness to reasoning steps

02

Models often fail to align draft conclusions with reasoning processes

03

The framework enables rigorous evaluation of reasoning faithfulness

Abstract

Large Reasoning Models (LRMs) have significantly enhanced their capabilities in complex problem-solving by introducing a thinking draft that enables multi-path Chain-of-Thought explorations before producing final answers. Ensuring the faithfulness of these intermediate reasoning processes is crucial for reliable monitoring, interpretation, and effective control. In this paper, we propose a systematic counterfactual intervention framework to rigorously evaluate thinking draft faithfulness. Our approach focuses on two complementary dimensions: (1) Intra-Draft Faithfulness, which assesses whether individual reasoning steps causally influence subsequent steps and the final draft conclusion through counterfactual step insertions; and (2) Draft-to-Answer Faithfulness, which evaluates whether final answers are logically consistent with and dependent on the thinking draft, by perturbing the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

polaris-73/faithful-thinking-draft
dataset· 62 dl
62 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Constraint Satisfaction and Optimization

MethodsALIGN