When the Chain Breaks: Interactive Diagnosis of LLM Chain-of-Thought Reasoning Errors
Shiwei Chen, Niruthikka Sritharan, Xiaolin Wen, Chenxi Zhang, Xingbo Wang, Yong Wang

TL;DR
This paper introduces ReasonDiag, an interactive system that visualizes and diagnoses errors in LLM Chain-of-Thought reasoning traces, combining error detection with visual tools to improve interpretability and error identification.
Contribution
The paper presents a novel interactive visualization system, ReasonDiag, that integrates error detection with visual analysis to facilitate diagnosis of reasoning errors in LLM CoT traces.
Findings
ReasonDiag effectively helps users identify erroneous reasoning steps.
The error detection pipeline combines fact-checking and logical validation.
User studies show improved understanding and error localization.
Abstract
Current Large Language Models (LLMs), especially Large Reasoning Models, can generate Chain-of-Thought (CoT) reasoning traces to illustrate how they produce final outputs, thereby facilitating trust calibration for users. However, these CoT reasoning traces are usually lengthy and tedious, and can contain various issues, such as logical and factual errors, which make it difficult for users to interpret the reasoning traces efficiently and accurately. To address these challenges, we develop an error detection pipeline that combines external fact-checking with symbolic formal logical validation to identify errors at the step level. Building on this pipeline, we propose ReasonDiag, an interactive visualization system for diagnosing CoT reasoning traces. ReasonDiag provides 1) an integrated arc diagram to show reasoning-step distributions and error-propagation patterns, and 2) a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
