Unifying Model Explainability and Robustness via Machine-Checkable   Concepts

Vedant Nanda; Till Speicher; John P. Dickerson; Krishna P. Gummadi,; Muhammad Bilal Zafar

arXiv:2007.00251·cs.AI·July 3, 2020·1 cites

Unifying Model Explainability and Robustness via Machine-Checkable Concepts

Vedant Nanda, Till Speicher, John P. Dickerson, Krishna P. Gummadi,, Muhammad Bilal Zafar

PDF

Open Access

TL;DR

This paper introduces an automated framework that uses machine-checkable concepts to evaluate and improve the robustness of deep neural network predictions by ensuring explanations align with model outputs, without manual intervention.

Contribution

We propose a scalable, automated framework for explanation-conformity checking using machine-checkable concepts to assess and enhance DNN prediction robustness.

Findings

01

Significantly improves prediction robustness and accuracy.

02

Predictions marked as robust are more resistant to adversarial attacks.

03

Framework scales well to datasets with many classes.

Abstract

As deep neural networks (DNNs) get adopted in an ever-increasing number of applications, explainability has emerged as a crucial desideratum for these models. In many real-world tasks, one of the principal reasons for requiring explainability is to in turn assess prediction robustness, where predictions (i.e., class labels) that do not conform to their respective explanations (e.g., presence or absence of a concept in the input) are deemed to be unreliable. However, most, if not all, prior methods for checking explanation-conformity (e.g., LIME, TCAV, saliency maps) require significant manual intervention, which hinders their large-scale deployability. In this paper, we propose a robustness-assessment framework, at the core of which is the idea of using machine-checkable concepts. Our framework defines a large number of concepts that the DNN explanations could be based on and performs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Anomaly Detection Techniques and Applications

MethodsLocal Interpretable Model-Agnostic Explanations