Pathologies of Neural Models Make Interpretations Difficult

Shi Feng; Eric Wallace; Alvin Grissom II; Mohit Iyyer; Pedro; Rodriguez; Jordan Boyd-Graber

arXiv:1804.07781·cs.CL·September 7, 2022·26 cites

Pathologies of Neural Models Make Interpretations Difficult

Shi Feng, Eric Wallace, Alvin Grissom II, Mohit Iyyer, Pedro, Rodriguez, Jordan Boyd-Graber

PDF

Open Access

TL;DR

This paper reveals that neural models exhibit pathological behaviors making interpretability methods unreliable, and proposes fine-tuning to improve interpretability without sacrificing accuracy.

Contribution

It uncovers limitations of current interpretation methods through input reduction and introduces a fine-tuning approach to enhance interpretability of neural models.

Findings

01

Input reduction exposes nonsensical remaining words.

02

Models maintain high confidence despite lacking informative input.

03

Fine-tuning improves interpretability without accuracy loss.

Abstract

One way to interpret neural model predictions is to highlight the most important input features---for example, a heatmap visualization over the words in an input sentence. In existing interpretation methods for NLP, a word's importance is determined by either input perturbation---measuring the decrease in model confidence when that word is removed---or by the gradient with respect to that word. To understand the limitations of these methods, we use input reduction, which iteratively removes the least important word from the input. This exposes pathological behaviors of neural models: the remaining words appear nonsensical to humans and are not the ones determined as important by interpretation methods. As we confirm with human experiments, the reduced examples lack information to support the prediction of any label, but models still make the same predictions with high confidence. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Topic Modeling

MethodsHeatmap