ExSum: From Local Explanations to Model Understanding
Yilun Zhou, Marco Tulio Ribeiro, Julie Shah

TL;DR
This paper introduces ExSum, a mathematical framework and metrics to quantify and improve model understanding from local explanations, addressing the gap between explanation correctness and human interpretability.
Contribution
The paper proposes ExSum, a novel framework for quantifying and assessing the quality of model understanding derived from local explanations.
Findings
ExSum reveals limitations in current interpretability practices.
It helps develop more accurate and reliable model understanding.
Connects understandability with explanation properties like robustness and plausibility.
Abstract
Interpretability methods are developed to understand the working mechanisms of black-box models, which is crucial to their responsible deployment. Fulfilling this goal requires both that the explanations generated by these methods are correct and that people can easily and reliably understand them. While the former has been addressed in prior work, the latter is often overlooked, resulting in informal model understanding derived from a handful of local explanations. In this paper, we introduce explanation summary (ExSum), a mathematical framework for quantifying model understanding, and propose metrics for its quality assessment. On two domains, ExSum highlights various limitations in the current practice, helps develop accurate model understanding, and reveals easily overlooked properties of the model. We also connect understandability to other properties of explanations such as human…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Topic Modeling
