An Empirical Comparison of Instance Attribution Methods for NLP
Pouya Pezeshkpour, Sarthak Jain, Byron C. Wallace, Sameer Singh

TL;DR
This paper empirically compares various instance attribution methods for NLP, evaluating their agreement and effectiveness in identifying training data influencing model predictions.
Contribution
It provides a systematic evaluation of simple retrieval versus gradient-based attribution methods, highlighting their similarities and differences in NLP tasks.
Findings
Simple retrieval methods often identify different training instances than influence functions.
Despite differences, simple methods show desirable characteristics similar to complex attribution techniques.
Gradient-based methods may offer more precise attribution but are computationally expensive.
Abstract
Widespread adoption of deep models has motivated a pressing need for approaches to interpret network outputs and to facilitate model debugging. Instance attribution methods constitute one means of accomplishing these goals by retrieving training instances that (may have) led to a particular prediction. Influence functions (IF; Koh and Liang 2017) provide machinery for doing this by quantifying the effect that perturbing individual train instances would have on a specific test prediction. However, even approximating the IF is computationally expensive, to the degree that may be prohibitive in many cases. Might simpler approaches (e.g., retrieving train examples most similar to a given test point) perform comparably? In this work, we evaluate the degree to which different potential instance attribution agree with respect to the importance of training samples. We find that simple retrieval…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Machine Learning and Data Classification
