Robust Models Are More Interpretable Because Attributions Look Normal

Zifan Wang; Matt Fredrikson; Anupam Datta

arXiv:2103.11257·cs.LG·October 7, 2021·6 cites

Robust Models Are More Interpretable Because Attributions Look Normal

Zifan Wang, Matt Fredrikson, Anupam Datta

PDF

Open Access 1 Repo

TL;DR

Robust models with smoother decision boundaries yield more accurate and interpretable feature attributions, enhancing explanations of model decisions through boundary attributions that focus on local decision boundary normals.

Contribution

The paper demonstrates that smoother decision boundaries in robust models improve interpretability and introduces boundary attributions to better explain model decisions.

Findings

01

Robust models have sharper, more concentrated attributions.

02

Boundary attributions improve explanation sharpness even in non-robust models.

03

Smoother boundaries lead to better alignment of input gradients with decision boundary normals.

Abstract

Recent work has found that adversarially-robust deep networks used for image classification are more interpretable: their feature attributions tend to be sharper, and are more concentrated on the objects associated with the image's ground-truth class. We show that smooth decision boundaries play an important role in this enhanced interpretability, as the model's input gradients around data points will more closely align with boundaries' normal vectors when they are smooth. Thus, because robust models have smoother boundaries, the results of gradient-based attribution methods, like Integrated Gradients and DeepLift, will capture more accurate information about nearby decision boundaries. This understanding of robust interpretability leads to our second contribution: \emph{boundary attributions}, which aggregate information about the normal vectors of local decision boundaries to explain…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zifanw/boundary
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Cell Image Analysis Techniques

Methods*Communicated@Fast*How Do I Communicate to Expedia?