# Frustratingly Poor Performance of Reading Comprehension Models on   Non-adversarial Examples

**Authors:** Soham Parikh, Ananya B. Sai, Preksha Nema, Mitesh M. Khapra

arXiv: 1904.02665 · 2019-04-05

## TL;DR

This paper reveals that current reading comprehension models perform poorly on easy, non-adversarial examples, highlighting their limited ability to leverage straightforward information even when it is clearly presented.

## Contribution

The study introduces a non-adversarial evaluation dataset for RC models, exposing their inability to improve on easy examples and assessing neural component utility.

## Key findings

- Models show minimal performance improvement on easy examples.
- Neural attention components often fail on straightforward passages.
- Non-adversarial dataset offers a realistic evaluation of RC models.

## Abstract

When humans learn to perform a difficult task (say, reading comprehension (RC) over longer passages), it is typically the case that their performance improves significantly on an easier version of this task (say, RC over shorter passages). Ideally, we would want an intelligent agent to also exhibit such a behavior. However, on experimenting with state of the art RC models using the standard RACE dataset, we observe that this is not true. Specifically, we see counter-intuitive results wherein even when we show frustratingly easy examples to the model at test time, there is hardly any improvement in its performance. We refer to this as non-adversarial evaluation as opposed to adversarial evaluation. Such non-adversarial examples allow us to assess the utility of specialized neural components. For example, we show that even for easy examples where the answer is clearly embedded in the passage, the neural components designed for paying attention to relevant portions of the passage fail to serve their intended purpose. We believe that the non-adversarial dataset created as a part of this work would complement the research on adversarial evaluation and give a more realistic assessment of the ability of RC models. All the datasets and codes developed as a part of this work will be made publicly available.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.02665/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/1904.02665/full.md

## References

24 references — full list in the complete paper: https://tomesphere.com/paper/1904.02665/full.md

---
Source: https://tomesphere.com/paper/1904.02665