Measuring Reasoning Trace Legibility: Can Those Who Understand Teach?
Dani Roytburg, Shreya Sridhar, Daphne Ippolito

TL;DR
This paper evaluates the legibility of reasoning traces in language models, introducing transfer utility as a measure of how well traces guide weaker models, revealing trade-offs and the need for better training incentives.
Contribution
It introduces transfer utility as a new metric for reasoning trace quality and analyzes the trade-offs between trace legibility and model performance.
Findings
High-performing models have less legible traces.
Trade-offs exist between trace length and transfer utility.
Reward models do not inherently promote trace legibility.
Abstract
Language models are increasingly being trained to "reason" before answering users' queries, outputting hundreds or even thousands of tokens worth of deliberation before their final answer. While the main intention of reasoning is to improve models' ability to arrive at a correct answer, we argue that these models should be assessed for the legibility of their reasoning traces in addition to the correctness of their final answers. In this paper, we evaluate 90k traces from 12 Reasoning Language Models (RLMs) for the quality of their reasoning traces. We introduce the concept of transfer utility, which assesses how useful an RLM's reasoning traces are for guiding a weaker, non-reasoning model toward arriving at the correct answer. We find that the reasoning traces of the highest-performing models rank among the lowest for legibility. Furthermore, we uncover tensions between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)
