Metrics for Explainable AI: Challenges and Prospects
Robert R. Hoffman, Shane T. Mueller, Gary Klein, Jordan Litman

TL;DR
This paper explores the challenges in measuring the effectiveness of explainable AI systems, focusing on evaluation methods for explanation quality, user understanding, trust, and system performance.
Contribution
It provides a comprehensive review of measurement concepts and proposes evaluation methods for assessing XAI systems' effectiveness.
Findings
Evaluation methods for explanation quality are diverse and context-dependent.
User trust and understanding are critical metrics for XAI success.
Psychometric evaluations can enhance the assessment of human-AI interactions.
Abstract
The question addressed in this paper is: If we present to a user an AI system that explains how it works, how do we know whether the explanation works and the user has achieved a pragmatic understanding of the AI? In other words, how do we know that an explanainable AI system (XAI) is any good? Our focus is on the key concepts of measurement. We discuss specific methods for evaluating: (1) the goodness of explanations, (2) whether users are satisfied by explanations, (3) how well users understand the AI systems, (4) how curiosity motivates the search for explanations, (5) whether the user's trust and reliance on the AI are appropriate, and finally, (6) how the human-XAI work system performs. The recommendations we present derive from our integration of extensive research literatures and our own psychometric evaluations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education · Ethics and Social Impacts of AI
