Improving Human Verification of LLM Reasoning through Interactive Explanation Interfaces
Runtao Zhou, Giang Nguyen, Nikita Kharya, Anh Totti Nguyen, Chirag Agarwal

TL;DR
This paper introduces three interactive interfaces for LLM reasoning explanations, significantly enhancing user ability to detect errors and understand reasoning processes in educational and problem-solving contexts.
Contribution
The paper presents novel interactive reasoning interfaces—iCoT, iPoT, and iGraph—that improve human comprehension and error detection in LLM-generated reasoning chains.
Findings
iGraph achieves 85.6% error detection accuracy
Interactive interfaces reduce validation time to around 58 seconds
Users prefer iGraph for its clarity and followability
Abstract
The reasoning capabilities of Large Language Models (LLMs) have led to their increasing employment in several critical applications, particularly education, where they support problem-solving, tutoring, and personalized study. Chain-of-thought (CoT) reasoning capabilities [1, 2] are well-known to help LLMs decompose a problem into steps and explore the solution spaces more effectively, leading to impressive performance on mathematical and reasoning benchmarks. As the length of CoT tokens per question increases substantially to even thousands of tokens per question [ 1], it is unknown how users could comprehend LLM reasoning and detect errors or hallucinations. To address this problem and understand how reasoning can improve human-AI interaction, we present three new interactive reasoning interfaces: interactive CoT (iCoT), interactive Program-of-Thought (iPoT), and interactive Graph…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
