DuoRC: Towards Complex Language Understanding with Paraphrased Reading Comprehension
Amrita Saha, Rahul Aralikatte, Mitesh M. Khapra, Karthik, Sankaranarayanan

TL;DR
DuoRC introduces a challenging new dataset for reading comprehension, featuring questions and answers derived from different versions of movie plots, requiring deeper understanding and reasoning beyond existing datasets.
Contribution
The paper presents DuoRC, a novel RC dataset with unique cross-version question-answer pairs from movie plots, designed to challenge neural models and promote advanced language understanding.
Findings
State-of-the-art models perform poorly on DuoRC compared to SQuAD.
DuoRC requires deeper reasoning and external knowledge.
The dataset highlights limitations of current neural RC approaches.
Abstract
We propose DuoRC, a novel dataset for Reading Comprehension (RC) that motivates several new challenges for neural approaches in language understanding beyond those offered by existing RC datasets. DuoRC contains 186,089 unique question-answer pairs created from a collection of 7680 pairs of movie plots where each pair in the collection reflects two versions of the same movie - one from Wikipedia and the other from IMDb - written by two different authors. We asked crowdsourced workers to create questions from one version of the plot and a different set of workers to extract or synthesize answers from the other version. This unique characteristic of DuoRC where questions and answers are created from different versions of a document narrating the same underlying story, ensures by design, that there is very little lexical overlap between the questions created from one version and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
