MuCPAD: A Multi-Domain Chinese Predicate-Argument Dataset
Yahui Liu, Haoping Yang, Chen Gong, Qingrong Xia, Zhenghua, Li, Min Zhang

TL;DR
MuCPAD is a comprehensive multi-domain Chinese predicate-argument dataset designed to improve cross-domain semantic role labeling, featuring a frame-free annotation approach, explicit omitted argument annotation, and high-quality standards.
Contribution
It introduces MuCPAD, a novel multi-domain Chinese SRL dataset with a frame-free annotation method and explicit omitted argument annotations, enhancing cross-domain SRL research.
Findings
Benchmark results show significant domain transfer challenges.
MuCPAD improves data quality with strict double annotation.
The dataset facilitates cross-domain SRL research.
Abstract
During the past decade, neural network models have made tremendous progress on in-domain semantic role labeling (SRL). However, performance drops dramatically under the out-of-domain setting. In order to facilitate research on cross-domain SRL, this paper presents MuCPAD, a multi-domain Chinese predicate-argument dataset, which consists of 30,897 sentences and 92,051 predicates from six different domains. MuCPAD exhibits three important features. 1) Based on a frame-free annotation methodology, we avoid writing complex frames for new predicates. 2) We explicitly annotate omitted core arguments to recover more complete semantic structure, considering that omission of content words is ubiquitous in multi-domain Chinese texts. 3) We compile 53 pages of annotation guidelines and adopt strict double annotation for improving data quality. This paper describes in detail the annotation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
