DuoRC: Towards Complex Language Understanding with Paraphrased Reading   Comprehension

Amrita Saha; Rahul Aralikatte; Mitesh M. Khapra; Karthik; Sankaranarayanan

arXiv:1804.07927·cs.CL·October 11, 2018

DuoRC: Towards Complex Language Understanding with Paraphrased Reading Comprehension

Amrita Saha, Rahul Aralikatte, Mitesh M. Khapra, Karthik, Sankaranarayanan

PDF

1 Repo 1 Models 2 Datasets

TL;DR

DuoRC introduces a challenging new dataset for reading comprehension, featuring questions and answers derived from different versions of movie plots, requiring deeper understanding and reasoning beyond existing datasets.

Contribution

The paper presents DuoRC, a novel RC dataset with unique cross-version question-answer pairs from movie plots, designed to challenge neural models and promote advanced language understanding.

Findings

01

State-of-the-art models perform poorly on DuoRC compared to SQuAD.

02

DuoRC requires deeper reasoning and external knowledge.

03

The dataset highlights limitations of current neural RC approaches.

Abstract

We propose DuoRC, a novel dataset for Reading Comprehension (RC) that motivates several new challenges for neural approaches in language understanding beyond those offered by existing RC datasets. DuoRC contains 186,089 unique question-answer pairs created from a collection of 7680 pairs of movie plots where each pair in the collection reflects two versions of the same movie - one from Wikipedia and the other from IMDb - written by two different authors. We asked crowdsourced workers to create questions from one version of the plot and a different set of workers to extract or synthesize answers from the other version. This unique characteristic of DuoRC where questions and answers are created from different versions of a document narrating the same underlying story, ensures by design, that there is very little lexical overlap between the questions created from one version and the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

duorc/duorc
noneOfficial

Models

🤗
bombastictranz/romeo-rosete
model· ♡ 1
♡ 1

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.