CLUENER2020: Fine-grained Named Entity Recognition Dataset and Benchmark for Chinese
Liang Xu, Yu tong, Qianqian Dong, Yixuan Liao, Cong Yu, Yin Tian,, Weitang Liu, Lu Li, Caiquan Liu, Xuanwei Zhang

TL;DR
This paper introduces CLUENER2020, a comprehensive and challenging Chinese NER dataset with 10 categories, along with baseline models and a leaderboard to advance fine-grained NER research.
Contribution
It provides a new, detailed Chinese NER dataset with diverse categories and benchmarks, facilitating future research in fine-grained Chinese NER tasks.
Findings
CLUENER2020 is more challenging than existing datasets.
Baseline models achieve lower performance on this dataset.
Human performance benchmarks are reported.
Abstract
In this paper, we introduce the NER dataset from CLUE organization (CLUENER2020), a well-defined fine-grained dataset for named entity recognition in Chinese. CLUENER2020 contains 10 categories. Apart from common labels like person, organization, and location, it contains more diverse categories. It is more challenging than current other Chinese NER datasets and could better reflect real-world applications. For comparison, we implement several state-of-the-art baselines as sequence labeling tasks and report human performance, as well as its analysis. To facilitate future work on fine-grained NER for Chinese, we release our dataset, baselines, and leader-board.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Data Quality and Management
