Neuron-level Interpretation of Deep NLP Models: A Survey
Hassan Sajjad, Nadir Durrani, Fahim Dalvi

TL;DR
This survey reviews recent advances in neuron-level interpretability of deep NLP models, covering methods, evaluations, findings, applications, and future research directions for understanding individual neuron functions.
Contribution
It provides a comprehensive overview of neuron analysis techniques and their applications in deep NLP models, highlighting recent progress and open challenges.
Findings
Neuron analysis reveals insights across different architectures.
Neuron probing enables model control and domain adaptation.
Open issues include evaluation standards and interpretability benchmarks.
Abstract
The proliferation of deep neural networks in various domains has seen an increased need for interpretability of these models. Preliminary work done along this line and papers that surveyed such, are focused on high-level representation analysis. However, a recent branch of work has concentrated on interpretability at a more granular level of analyzing neurons within these models. In this paper, we survey the work done on neuron analysis including: i) methods to discover and understand neurons in a network, ii) evaluation methods, iii) major findings including cross architectural comparisons that neuron analysis has unraveled, iv) applications of neuron probing such as: controlling the model, domain adaptation etc., and v) a discussion on open issues and future research directions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications
