# Right for the Right Reasons: Training Differentiable Models by   Constraining their Explanations

**Authors:** Andrew Slavin Ross, Michael C. Hughes, Finale Doshi-Velez

arXiv: 1703.03717 · 2017-11-15

## TL;DR

This paper presents a method to improve neural network trustworthiness by explaining and constraining their input gradients, leading to models that are more faithful and generalize better across different conditions.

## Contribution

It introduces an efficient gradient-based explanation and regularization technique for differentiable models, enabling better trust and robustness in neural networks.

## Key findings

- Models trained with the proposed method produce more faithful explanations.
- Regularized models generalize better under distribution shifts.
- The approach scales to large datasets and complex models.

## Abstract

Neural networks are among the most accurate supervised learning methods in use today, but their opacity makes them difficult to trust in critical applications, especially when conditions in training differ from those in test. Recent work on explanations for black-box models has produced tools (e.g. LIME) to show the implicit rules behind predictions, which can help us identify when models are right for the wrong reasons. However, these methods do not scale to explaining entire datasets and cannot correct the problems they reveal. We introduce a method for efficiently explaining and regularizing differentiable models by examining and selectively penalizing their input gradients, which provide a normal to the decision boundary. We apply these penalties both based on expert annotation and in an unsupervised fashion that encourages diverse models with qualitatively different decision boundaries for the same classification problem. On multiple datasets, we show our approach generates faithful explanations and models that generalize much better when conditions differ between training and test.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1703.03717/full.md

## Figures

20 figures with captions in the complete paper: https://tomesphere.com/paper/1703.03717/full.md

## References

34 references — full list in the complete paper: https://tomesphere.com/paper/1703.03717/full.md

---
Source: https://tomesphere.com/paper/1703.03717