TL;DR
This paper applies layer-wise relevance propagation (LRP) to NLP, specifically to explain CNN predictions in topic categorization, demonstrating its effectiveness over traditional sensitivity analysis through various experiments.
Contribution
First application of LRP to NLP CNNs, providing a new method for explaining model predictions in natural language processing tasks.
Findings
LRP effectively highlights relevant words for CNN predictions
LRP outperforms sensitivity analysis in explanation quality
Experiments validate LRP's suitability for NLP model interpretability
Abstract
Layer-wise relevance propagation (LRP) is a recently proposed technique for explaining predictions of complex non-linear classifiers in terms of input variables. In this paper, we apply LRP for the first time to natural language processing (NLP). More precisely, we use it to explain the predictions of a convolutional neural network (CNN) trained on a topic categorization task. Our analysis highlights which words are relevant for a specific prediction of the CNN. We compare our technique to standard sensitivity analysis, both qualitatively and quantitatively, using a "word deleting" perturbation experiment, a PCA analysis, and various visualizations. All experiments validate the suitability of LRP for explaining the CNN predictions, which is also in line with results reported in recent image classification studies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsPrincipal Components Analysis
