TRUE: A Trustworthy Unified Explanation Framework for Large Language Model Reasoning
Yujiao Yang

TL;DR
TRUE is a comprehensive framework that offers trustworthy, multi-level explanations for large language model reasoning, enhancing interpretability and reliability through executable verification, structural modeling, and failure analysis.
Contribution
The paper introduces TRUE, a novel unified explanation framework that integrates executable reasoning verification, structural modeling, and causal failure analysis for LLMs.
Findings
Provides verifiable reasoning structures for individual instances
Characterizes reasoning stability via feasible-region DAGs
Identifies and quantifies recurring failure modes
Abstract
Large language models (LLMs) have demonstrated strong capabilities in complex reasoning tasks, yet their decision-making processes remain difficult to interpret. Existing explanation methods often lack trustworthy structural insight and are limited to single-instance analysis, failing to reveal reasoning stability and systematic failure mechanisms. To address these limitations, we propose the Trustworthy Unified Explanation Framework (TRUE), which integrates executable reasoning verification, feasible-region directed acyclic graph (DAG) modeling, and causal failure mode analysis. At the instance level, we redefine reasoning traces as executable process specifications and introduce blind execution verification to assess operational validity. At the local structural level, we construct feasible-region DAGs via structure-consistent perturbations, enabling explicit characterization of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Multimodal Machine Learning Applications
