Understanding Neural Networks through Representation Erasure

Jiwei Li; Will Monroe; Dan Jurafsky

arXiv:1612.08220·cs.CL·January 11, 2017·462 cites

Understanding Neural Networks through Representation Erasure

Jiwei Li, Will Monroe, Dan Jurafsky

PDF

Open Access

TL;DR

This paper introduces a methodology for interpreting neural network decisions in NLP by erasing parts of the representation and observing effects, enhancing understanding and error analysis.

Contribution

It presents a general approach to analyze neural models through representation erasure, applicable across various NLP tasks, and introduces techniques like reinforcement learning for minimal erasure.

Findings

01

Effective in explaining neural decisions

02

Applicable to multiple NLP tasks

03

Aids in error analysis

Abstract

While neural networks have been successfully applied to many natural language processing tasks, they come at the cost of interpretability. In this paper, we propose a general methodology to analyze and interpret decisions from a neural model by observing the effects on the model of erasing various parts of the representation, such as input word-vector dimensions, intermediate hidden units, or input words. We present several approaches to analyzing the effects of such erasure, from computing the relative difference in evaluation metrics, to using reinforcement learning to erase the minimum set of input words in order to flip a neural model's decision. In a comprehensive analysis of multiple NLP tasks, including linguistic feature classification, sentence-level sentiment analysis, and document level sentiment aspect prediction, we show that the proposed methodology not only offers clear…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning