CLUENER2020: Fine-grained Named Entity Recognition Dataset and Benchmark   for Chinese

Liang Xu; Yu tong; Qianqian Dong; Yixuan Liao; Cong Yu; Yin Tian,; Weitang Liu; Lu Li; Caiquan Liu; Xuanwei Zhang

arXiv:2001.04351·cs.CL·January 22, 2020·48 cites

CLUENER2020: Fine-grained Named Entity Recognition Dataset and Benchmark for Chinese

Liang Xu, Yu tong, Qianqian Dong, Yixuan Liao, Cong Yu, Yin Tian,, Weitang Liu, Lu Li, Caiquan Liu, Xuanwei Zhang

PDF

Open Access 3 Repos 4 Models 1 Datasets

TL;DR

This paper introduces CLUENER2020, a comprehensive and challenging Chinese NER dataset with 10 categories, along with baseline models and a leaderboard to advance fine-grained NER research.

Contribution

It provides a new, detailed Chinese NER dataset with diverse categories and benchmarks, facilitating future research in fine-grained Chinese NER tasks.

Findings

01

CLUENER2020 is more challenging than existing datasets.

02

Baseline models achieve lower performance on this dataset.

03

Human performance benchmarks are reported.

Abstract

In this paper, we introduce the NER dataset from CLUE organization (CLUENER2020), a well-defined fine-grained dataset for named entity recognition in Chinese. CLUENER2020 contains 10 categories. Apart from common labels like person, organization, and location, it contains more diverse categories. It is more challenging than current other Chinese NER datasets and could better reflect real-world applications. For comparison, we implement several state-of-the-art baselines as sequence labeling tasks and report human performance, as well as its analysis. To facilitate future work on fine-grained NER for Chinese, we release our dataset, baselines, and leader-board.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

zjunlp/iepile
dataset· 1.5k dl
1.5k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Data Quality and Management