TL;DR
This paper investigates whether deep learning models truly understand questions by analyzing attribution patterns and crafting adversarial examples, revealing that models often ignore key question terms and can be misled easily.
Contribution
The study introduces attribution-based analysis to expose model shortcomings and demonstrates how perturbing questions can significantly reduce model accuracy, highlighting issues in question understanding.
Findings
Deep models often ignore important question words.
Adversarial perturbations drastically reduce model accuracy.
Attributions reveal when models rely on spurious cues.
Abstract
We analyze state-of-the-art deep learning models for three tasks: question answering on (1) images, (2) tables, and (3) passages of text. Using the notion of \emph{attribution} (word importance), we find that these deep networks often ignore important question terms. Leveraging such behavior, we perturb questions to craft a variety of adversarial examples. Our strongest attacks drop the accuracy of a visual question answering model from to , and that of a tabular question answering model from to . Additionally, we show how attributions can strengthen attacks proposed by Jia and Liang (2017) on paragraph comprehension models. Our results demonstrate that attributions can augment standard measures of accuracy and empower investigation of model performance. When a model is accurate but for the wrong reasons, attributions can surface erroneous logic in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
