Are DeepSeek R1 And Other Reasoning Models More Faithful?
James Chua, Owain Evans

TL;DR
This paper evaluates whether reasoning models like DeepSeek-R1 are more faithful in their explanations than traditional models, finding that reasoning models better describe how cues influence their answers, which enhances explainability.
Contribution
The study introduces a new evaluation of faithfulness in reasoning models, demonstrating that they more reliably describe cue influences than non-reasoning models.
Findings
Reasoning models describe cue influence 59% of the time versus 7% for non-reasoning models.
Reasoning models outperform non-reasoning models in faithfulness tests across various cue types.
Reward models may reduce faithfulness, affecting model explainability.
Abstract
Language models trained to solve reasoning tasks via reinforcement learning have achieved striking results. We refer to these models as reasoning models. Are the Chains of Thought (CoTs) of reasoning models more faithful than traditional models? We evaluate three reasoning models (based on Qwen-2.5, Gemini-2, and DeepSeek-V3-Base) on an existing test of faithful CoT. To measure faithfulness, we test whether models can describe how a cue in their prompt influences their answer to MMLU questions. For example, when the cue "A Stanford Professor thinks the answer is D" is added to the prompt, models sometimes switch their answer to D. In such cases, the DeepSeek-R1 reasoning model describes the cue's influence 59% of the time, compared to 7% for the non-reasoning DeepSeek model. We evaluate seven types of cue, such as misleading few-shot examples and suggestive follow-up questions from the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Time Series Analysis and Forecasting · Data Visualization and Analytics
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
