Explanations are a Means to an End: Decision Theoretic Explanation Evaluation
Ziyang Guo, Berk Ustun, Jessica Hullman

TL;DR
This paper introduces a decision theoretic framework for evaluating explanations based on their expected improvement on specific decision tasks, providing theoretical benchmarks and practical assessment methods.
Contribution
It proposes a novel decision theoretic approach to explanation evaluation, linking explanations directly to decision-making performance and introducing three distinct evaluative metrics.
Findings
The framework defines theoretical, human-complementary, and behavioral explanation values.
Applied to human-AI decision support, it quantifies explanation potential.
Validated in mechanistic interpretability contexts.
Abstract
Explanations of model behavior are commonly evaluated via proxy properties weakly tied to the purposes explanations serve in practice. We contribute a decision theoretic framework that treats explanations as information signals valued by the expected improvement they enable on a specified decision task. This approach yields three distinct estimands: 1) a theoretical benchmark that upperbounds achievable performance by any agent with the explanation, 2) a human-complementary value that quantifies the theoretically attainable value that is not already captured by a baseline human decision policy, and 3) a behavioral value representing the causal effect of providing the explanation to human decision-makers. We instantiate these definitions in a practical validation workflow, and apply them to assess explanation potential and interpret behavioral effects in human-AI decision support and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Embodied and Extended Cognition
