DRCD: a Chinese Machine Reading Comprehension Dataset
Chih Chieh Shao, Trois Liu, Yuting Lai, Yiying Tseng, Sam Tsai

TL;DR
This paper introduces DRCD, a large Chinese machine reading comprehension dataset from Wikipedia, providing a benchmark for transfer learning and model evaluation in Chinese MRC.
Contribution
The paper presents a new open-domain Chinese MRC dataset with over 30,000 questions, establishing a standard resource for future research and transfer learning.
Findings
Baseline model achieves 89.59% F1 score
Human performance on the dataset is 93.30% F1
Dataset covers diverse Wikipedia articles
Abstract
In this paper, we introduce DRCD (Delta Reading Comprehension Dataset), an open domain traditional Chinese machine reading comprehension (MRC) dataset. This dataset aimed to be a standard Chinese machine reading comprehension dataset, which can be a source dataset in transfer learning. The dataset contains 10,014 paragraphs from 2,108 Wikipedia articles and 30,000+ questions generated by annotators. We build a baseline model that achieves an F1 score of 89.59%. F1 score of Human performance is 93.30%.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
