Interpretations are useful: penalizing explanations to align neural networks with prior knowledge
Laura Rieger, Chandan Singh, W. James Murdoch, Bin Yu

TL;DR
This paper introduces CDEP, a method that uses explanation penalization to improve neural network accuracy by correcting feature importance errors based on explanations, thus enabling actionable insights.
Contribution
The paper presents a novel explanation penalization technique that leverages existing explanation methods to enhance model accuracy and correct feature importance errors.
Findings
CDEP improves model performance on toy datasets.
CDEP corrects feature importance errors effectively.
The method enhances interpretability and accuracy simultaneously.
Abstract
For an explanation of a deep learning model to be effective, it must provide both insight into a model and suggest a corresponding action in order to achieve some objective. Too often, the litany of proposed explainable deep learning methods stop at the first step, providing practitioners with insight into a model, but no way to act on it. In this paper, we propose contextual decomposition explanation penalization (CDEP), a method which enables practitioners to leverage existing explanation methods in order to increase the predictive accuracy of deep learning models. In particular, when shown that a model has incorrectly assigned importance to some features, CDEP enables practitioners to correct these errors by directly regularizing the provided explanations. Using explanations provided by contextual decomposition (CD) (Murdoch et al., 2018), we demonstrate the ability of our method to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Machine Learning in Healthcare
MethodsContextual Decomposition Explanation Penalization
