Neuron-level Interpretation of Deep NLP Models: A Survey

Hassan Sajjad; Nadir Durrani; Fahim Dalvi

arXiv:2108.13138·cs.CL·August 17, 2022

Neuron-level Interpretation of Deep NLP Models: A Survey

Hassan Sajjad, Nadir Durrani, Fahim Dalvi

PDF

Open Access

TL;DR

This survey reviews recent advances in neuron-level interpretability of deep NLP models, covering methods, evaluations, findings, applications, and future research directions for understanding individual neuron functions.

Contribution

It provides a comprehensive overview of neuron analysis techniques and their applications in deep NLP models, highlighting recent progress and open challenges.

Findings

01

Neuron analysis reveals insights across different architectures.

02

Neuron probing enables model control and domain adaptation.

03

Open issues include evaluation standards and interpretability benchmarks.

Abstract

The proliferation of deep neural networks in various domains has seen an increased need for interpretability of these models. Preliminary work done along this line and papers that surveyed such, are focused on high-level representation analysis. However, a recent branch of work has concentrated on interpretability at a more granular level of analyzing neurons within these models. In this paper, we survey the work done on neuron analysis including: i) methods to discover and understand neurons in a network, ii) evaluation methods, iii) major findings including cross architectural comparisons that neuron analysis has unraveled, iv) applications of neuron probing such as: controlling the model, domain adaptation etc., and v) a discussion on open issues and future research directions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications