Unifying Model Explainability and Robustness via Machine-Checkable Concepts
Vedant Nanda, Till Speicher, John P. Dickerson, Krishna P. Gummadi,, Muhammad Bilal Zafar

TL;DR
This paper introduces an automated framework that uses machine-checkable concepts to evaluate and improve the robustness of deep neural network predictions by ensuring explanations align with model outputs, without manual intervention.
Contribution
We propose a scalable, automated framework for explanation-conformity checking using machine-checkable concepts to assess and enhance DNN prediction robustness.
Findings
Significantly improves prediction robustness and accuracy.
Predictions marked as robust are more resistant to adversarial attacks.
Framework scales well to datasets with many classes.
Abstract
As deep neural networks (DNNs) get adopted in an ever-increasing number of applications, explainability has emerged as a crucial desideratum for these models. In many real-world tasks, one of the principal reasons for requiring explainability is to in turn assess prediction robustness, where predictions (i.e., class labels) that do not conform to their respective explanations (e.g., presence or absence of a concept in the input) are deemed to be unreliable. However, most, if not all, prior methods for checking explanation-conformity (e.g., LIME, TCAV, saliency maps) require significant manual intervention, which hinders their large-scale deployability. In this paper, we propose a robustness-assessment framework, at the core of which is the idea of using machine-checkable concepts. Our framework defines a large number of concepts that the DNN explanations could be based on and performs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Anomaly Detection Techniques and Applications
MethodsLocal Interpretable Model-Agnostic Explanations
