A Chinese Corpus for Fine-grained Entity Typing

Chin Lee; Hongliang Dai; Yangqiu Song; Xin Li

arXiv:2004.08825·cs.CL·April 21, 2020·6 cites

A Chinese Corpus for Fine-grained Entity Typing

Chin Lee, Hongliang Dai, Yangqiu Song, Xin Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new Chinese dataset for fine-grained entity typing, including 4,800 manually labeled mentions and a categorization into 10 general types, enabling better NLP applications in Chinese.

Contribution

The paper provides the first Chinese fine-grained entity typing dataset with manual annotations and explores neural models and cross-lingual transfer learning for improved performance.

Findings

01

Neural models achieve promising results on the dataset.

02

Cross-lingual transfer learning enhances Chinese entity typing.

03

The dataset facilitates future research in Chinese NLP.

Abstract

Fine-grained entity typing is a challenging task with wide applications. However, most existing datasets for this task are in English. In this paper, we introduce a corpus for Chinese fine-grained entity typing that contains 4,800 mentions manually labeled through crowdsourcing. Each mention is annotated with free-form entity types. To make our dataset useful in more possible scenarios, we also categorize all the fine-grained types into 10 general types. Finally, we conduct experiments with some neural models whose structures are typical in fine-grained entity typing and show how well they perform on our dataset. We also show the possibility of improving Chinese fine-grained entity typing through cross-lingual transfer learning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

HKUST-KnowComp/cfet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems