TL;DR
TellMeWhy introduces a large dataset of over 30,000 questions about why characters act in narratives, highlighting current models' struggles with external commonsense reasoning and emphasizing the need for improved narrative understanding.
Contribution
The paper presents TellMeWhy, a new dataset for answering why-questions in narratives, including external knowledge, and introduces a human evaluation system to assess model performance.
Findings
State-of-the-art models perform significantly worse than humans on the dataset.
Models struggle more with questions requiring external commonsense knowledge.
The dataset reveals gaps in current narrative understanding capabilities.
Abstract
Answering questions about why characters perform certain actions is central to understanding and reasoning about narratives. Despite recent progress in QA, it is not clear if existing models have the ability to answer "why" questions that may require commonsense knowledge external to the input narrative. In this work, we introduce TellMeWhy, a new crowd-sourced dataset that consists of more than 30k questions and free-form answers concerning why characters in short narratives perform the actions described. For a third of this dataset, the answers are not present within the narrative. Given the limitations of automated evaluation for this task, we also present a systematized human evaluation interface for this dataset. Our evaluation of state-of-the-art models show that they are far below human performance on answering such questions. They are especially worse on questions whose answers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
