A Span-Extraction Dataset for Chinese Machine Reading Comprehension

Yiming Cui; Ting Liu; Wanxiang Che; Li Xiao; Zhipeng Chen; Wentao Ma,; Shijin Wang; Guoping Hu

arXiv:1810.07366·cs.CL·November 5, 2019

A Span-Extraction Dataset for Chinese Machine Reading Comprehension

Yiming Cui, Ting Liu, Wanxiang Che, Li Xiao, Zhipeng Chen, Wentao Ma,, Shijin Wang, Guoping Hu

PDF

1 Repo

TL;DR

This paper introduces a new Chinese span-extraction dataset for machine reading comprehension, addressing the lack of Chinese datasets and providing resources to advance research in this language.

Contribution

It presents a large-scale Chinese MRC dataset with real questions and a challenge set, along with baseline systems and hosting a dedicated evaluation workshop.

Findings

01

Baseline systems demonstrate the dataset's difficulty.

02

The challenge set requires multi-sentence inference.

03

Resources are publicly available for research advancement.

Abstract

Machine Reading Comprehension (MRC) has become enormously popular recently and has attracted a lot of attention. However, the existing reading comprehension datasets are mostly in English. In this paper, we introduce a Span-Extraction dataset for Chinese machine reading comprehension to add language diversities in this area. The dataset is composed by near 20,000 real questions annotated on Wikipedia paragraphs by human experts. We also annotated a challenge set which contains the questions that need comprehensive understanding and multi-sentence inference throughout the context. We present several baseline systems as well as anonymous submissions for demonstrating the difficulties in this dataset. With the release of the dataset, we hosted the Second Evaluation Workshop on Chinese Machine Reading Comprehension (CMRC 2018). We hope the release of the dataset could further accelerate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ymcui/cmrc2018
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.