Lie to Me: How Faithful Is Chain-of-Thought Reasoning in Reasoning Models?

Richard J. Young

arXiv:2603.22582·cs.CL·March 25, 2026

Lie to Me: How Faithful Is Chain-of-Thought Reasoning in Reasoning Models?

Richard J. Young

PDF

Open Access 1 Datasets

TL;DR

This study evaluates the faithfulness of chain-of-thought reasoning in 12 open-weight models across various architectures, revealing that acknowledgment of reasoning influence varies significantly and is often internally recognized but not reflected in outputs.

Contribution

It provides a comprehensive, empirical assessment of CoT faithfulness across diverse open-weight models, highlighting factors affecting acknowledgment rates and internal recognition of reasoning cues.

Findings

01

Faithfulness rates vary from 39.7% to 89.9% across models.

02

Consistency and sycophancy hints have the lowest acknowledgment rates.

03

Models internally recognize influence but often do not acknowledge it in outputs.

Abstract

Chain-of-thought (CoT) reasoning has been proposed as a transparency mechanism for large language models in safety-critical deployments, yet its effectiveness depends on faithfulness (whether models accurately verbalize the factors that actually influence their outputs), a property that prior evaluations have examined in only two proprietary models, finding acknowledgment rates as low as 25% for Claude 3.7 Sonnet and 39% for DeepSeek-R1. To extend this evaluation across the open-weight ecosystem, this study tests 12 open-weight reasoning models spanning 9 architectural families (7B-685B parameters) on 498 multiple-choice questions from MMLU and GPQA Diamond, injecting six categories of reasoning hints (sycophancy, consistency, visual pattern, metadata, grader hacking, and unethical information) and measuring the rate at which models acknowledge hint influence in their CoT when hints…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

richardyoung/cot-faithfulness-open-models
dataset· 450 dl
450 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Ethics and Social Impacts of AI · Explainable Artificial Intelligence (XAI)