On Attribution of Recurrent Neural Network Predictions via Additive Decomposition
Mengnan Du, Ninghao Liu, Fan Yang, Shuiwang Ji, Xia Hu

TL;DR
This paper introduces REAT, a novel additive decomposition method that enhances the interpretability of RNNs by providing faithful, phrase-level attribution scores for predictions, applicable across various RNN architectures.
Contribution
The paper proposes REAT, a flexible attribution method that decomposes RNN predictions into additive contributions of words and phrases, improving interpretability and faithfulness over existing approaches.
Findings
REAT provides faithful, interpretable attributions for RNN predictions.
The method is applicable to various RNN architectures including GRU and LSTM.
Analysis reveals linguistic knowledge captured by RNNs and potential for debugging.
Abstract
RNN models have achieved the state-of-the-art performance in a wide range of text mining tasks. However, these models are often regarded as black-boxes and are criticized due to the lack of interpretability. In this paper, we enhance the interpretability of RNNs by providing interpretable rationales for RNN predictions. Nevertheless, interpreting RNNs is a challenging problem. Firstly, unlike existing methods that rely on local approximation, we aim to provide rationales that are more faithful to the decision making process of RNN models. Secondly, a flexible interpretation method should be able to assign contribution scores to text segments of varying lengths, instead of only to individual words. To tackle these challenges, we propose a novel attribution method, called REAT, to provide interpretations to RNN predictions. REAT decomposes the final prediction of a RNN into additive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Software Engineering Research
MethodsInterpretability · Sigmoid Activation · Tanh Activation · Gated Recurrent Unit · Long Short-Term Memory
