Learning Credible Deep Neural Networks with Rationale Regularization
Mengnan Du, Ninghao Liu, Fan Yang, Xia Hu

TL;DR
This paper introduces CREX, a regularization method for deep neural networks that uses expert rationales to improve model credibility and generalization, especially on unseen data.
Contribution
CREX is a novel regularization technique that incorporates rationales or sparsity constraints to enhance the credibility and robustness of DNNs.
Findings
CREX improves the credibility of DNN explanations.
CREX increases accuracy on unseen data.
CREX does not always improve test set accuracy.
Abstract
Recent explainability related studies have shown that state-of-the-art DNNs do not always adopt correct evidences to make decisions. It not only hampers their generalization but also makes them less likely to be trusted by end-users. In pursuit of developing more credible DNNs, in this paper we propose CREX, which encourages DNN models to focus more on evidences that actually matter for the task at hand, and to avoid overfitting to data-dependent bias and artifacts. Specifically, CREX regularizes the training process of DNNs with rationales, i.e., a subset of features highlighted by domain experts as justifications for predictions, to enforce DNNs to generate local explanations that conform with expert rationales. Even when rationales are not available, CREX still could be useful by requiring the generated explanations to be sparse. Experimental results on two text classification…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Adversarial Robustness in Machine Learning
