Robust Models Are More Interpretable Because Attributions Look Normal
Zifan Wang, Matt Fredrikson, Anupam Datta

TL;DR
Robust models with smoother decision boundaries yield more accurate and interpretable feature attributions, enhancing explanations of model decisions through boundary attributions that focus on local decision boundary normals.
Contribution
The paper demonstrates that smoother decision boundaries in robust models improve interpretability and introduces boundary attributions to better explain model decisions.
Findings
Robust models have sharper, more concentrated attributions.
Boundary attributions improve explanation sharpness even in non-robust models.
Smoother boundaries lead to better alignment of input gradients with decision boundary normals.
Abstract
Recent work has found that adversarially-robust deep networks used for image classification are more interpretable: their feature attributions tend to be sharper, and are more concentrated on the objects associated with the image's ground-truth class. We show that smooth decision boundaries play an important role in this enhanced interpretability, as the model's input gradients around data points will more closely align with boundaries' normal vectors when they are smooth. Thus, because robust models have smoother boundaries, the results of gradient-based attribution methods, like Integrated Gradients and DeepLift, will capture more accurate information about nearby decision boundaries. This understanding of robust interpretability leads to our second contribution: \emph{boundary attributions}, which aggregate information about the normal vectors of local decision boundaries to explain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Cell Image Analysis Techniques
Methods*Communicated@Fast*How Do I Communicate to Expedia?
