DRCD: a Chinese Machine Reading Comprehension Dataset

Chih Chieh Shao; Trois Liu; Yuting Lai; Yiying Tseng; Sam Tsai

arXiv:1806.00920·cs.CL·May 30, 2019·85 cites

DRCD: a Chinese Machine Reading Comprehension Dataset

Chih Chieh Shao, Trois Liu, Yuting Lai, Yiying Tseng, Sam Tsai

PDF

Open Access 1 Repo

TL;DR

This paper introduces DRCD, a large Chinese machine reading comprehension dataset from Wikipedia, providing a benchmark for transfer learning and model evaluation in Chinese MRC.

Contribution

The paper presents a new open-domain Chinese MRC dataset with over 30,000 questions, establishing a standard resource for future research and transfer learning.

Findings

01

Baseline model achieves 89.59% F1 score

02

Human performance on the dataset is 93.30% F1

03

Dataset covers diverse Wikipedia articles

Abstract

In this paper, we introduce DRCD (Delta Reading Comprehension Dataset), an open domain traditional Chinese machine reading comprehension (MRC) dataset. This dataset aimed to be a standard Chinese machine reading comprehension dataset, which can be a source dataset in transfer learning. The dataset contains 10,014 paragraphs from 2,108 Wikipedia articles and 30,000+ questions generated by annotators. We build a baseline model that achieves an F1 score of 89.59%. F1 score of Human performance is 93.30%.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chiahsuan156/ODSQA
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications