Interpreting Neural Networks With Nearest Neighbors

Eric Wallace; Shi Feng; Jordan Boyd-Graber

arXiv:1809.02847·cs.CL·November 8, 2018

Interpreting Neural Networks With Nearest Neighbors

Eric Wallace, Shi Feng, Jordan Boyd-Graber

PDF

1 Repo

TL;DR

This paper introduces a method combining neural networks with Deep k-Nearest Neighbors to improve the robustness of feature importance explanations, aligning better with human perception without sacrificing accuracy.

Contribution

It proposes a novel approach that enhances local interpretability of neural networks by integrating Deep k-Nearest Neighbors for more reliable feature importance estimation.

Findings

01

Improved alignment of interpretations with human perception.

02

Maintained text classification accuracy.

03

Provided insights into dataset annotation artifacts.

Abstract

Local model interpretation methods explain individual predictions by assigning an importance value to each input feature. This value is often determined by measuring the change in confidence when a feature is removed. However, the confidence of neural networks is not a robust measure of model uncertainty. This issue makes reliably judging the importance of the input features difficult. We address this by changing the test-time behavior of neural networks using Deep k-Nearest Neighbors. Without harming text classification accuracy, this algorithm provides a more robust uncertainty metric which we use to generate feature importance values. The resulting interpretations better align with human perception than baseline methods. Finally, we use our interpretation method to analyze model predictions on dataset annotation artifacts.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Eric-Wallace/trickme-interface
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.