# MCScript2.0: A Machine Comprehension Corpus Focused on Script Events and   Participants

**Authors:** Simon Ostermann, Michael Roth, Manfred Pinkal

arXiv: 1905.09531 · 2019-05-31

## TL;DR

MCScript2.0 is a challenging machine comprehension dataset focused on script and commonsense knowledge, revealing current models' limitations despite human ease.

## Contribution

Introduces a new large-scale corpus for evaluating script and commonsense reasoning in machine comprehension tasks.

## Key findings

- Models perform poorly on script-based questions
- Half of questions require external commonsense knowledge
- Dataset is challenging for existing machine comprehension models

## Abstract

We introduce MCScript2.0, a machine comprehension corpus for the end-to-end evaluation of script knowledge. MCScript2.0 contains approx. 20,000 questions on approx. 3,500 texts, crowdsourced based on a new collection process that results in challenging questions. Half of the questions cannot be answered from the reading texts, but require the use of commonsense and, in particular, script knowledge. We give a thorough analysis of our corpus and show that while the task is not challenging to humans, existing machine comprehension models fail to perform well on the data, even if they make use of a commonsense knowledge base. The dataset is available at http://www.sfb1102.uni-saarland.de/?page_id=2582

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.09531/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/1905.09531/full.md

## References

33 references — full list in the complete paper: https://tomesphere.com/paper/1905.09531/full.md

---
Source: https://tomesphere.com/paper/1905.09531