Do Cognitively Interpretable Reasoning Traces Improve LLM Performance?
Siddhant Bhambri, Upasana Biswas, Subbarao Kambhampati

TL;DR
This paper investigates whether the interpretability of reasoning traces in LLMs is necessary for improved performance, finding that more effective traces are often less interpretable to humans.
Contribution
It demonstrates that high-performing reasoning traces in LLMs are not necessarily interpretable, challenging assumptions about the importance of interpretability for performance.
Findings
Fine-tuning on DeepSeek R1 traces improves performance.
Participants rated R1 traces as less interpretable.
There is a mismatch between trace interpretability and model performance.
Abstract
Recent progress in reasoning-oriented Large Language Models (LLMs) has been driven by introducing Chain-of-Thought (CoT) traces, where models generate intermediate reasoning traces before producing an answer. These traces, as in DeepSeek R1, are not only used to guide inference but also serve as supervision signals for distillation into smaller models. A common but often implicit assumption is that CoT traces should be semantically meaningful and interpretable to the end user. While recent research questions the need for semantic nature of these traces, in this paper, we ask: ``\textit{Must CoT reasoning traces be interpretable to enhance LLM task performance?}" We investigate this question in the Open Book Question-Answering domain by supervised fine-tuning LLaMA and Qwen models on four types of reasoning traces: (1) DeepSeek R1 traces, (2) LLM-generated summaries of R1 traces, (3)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
