Interpreting Deep Learning Models in Natural Language Processing: A Review
Xiaofei Sun, Diyi Yang, Xiaoya Li, Tianwei Zhang, Yuxian Meng, Han, Qiu, Guoyin Wang, Eduard Hovy, Jiwei Li

TL;DR
This paper reviews various interpretation methods for neural network models in NLP, highlighting their categories, sub-categories, limitations, and future research directions to improve model interpretability.
Contribution
It provides a comprehensive taxonomy and detailed analysis of existing interpretation methods for neural NLP models, identifying gaps and proposing future research avenues.
Findings
High-level taxonomy of interpretation methods in NLP
Detailed description of sub-categories like influence functions and attention
Identification of deficiencies and future research directions
Abstract
Neural network models have achieved state-of-the-art performances in a wide range of natural language processing (NLP) tasks. However, a long-standing criticism against neural network models is the lack of interpretability, which not only reduces the reliability of neural NLP systems but also limits the scope of their applications in areas where interpretability is essential (e.g., health care applications). In response, the increasing interest in interpreting neural NLP models has spurred a diverse array of interpretation methods over recent years. In this survey, we provide a comprehensive review of various interpretation methods for neural models in NLP. We first stretch out a high-level taxonomy for interpretation methods in NLP, i.e., training-based approaches, test-based approaches, and hybrid approaches. Next, we describe sub-categories in each category in detail, e.g.,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning
