Few-NERD: A Few-Shot Named Entity Recognition Dataset

Ning Ding; Guangwei Xu; Yulin Chen; Xiaobin Wang; Xu Han; Pengjun Xie,; Hai-Tao Zheng; Zhiyuan Liu

arXiv:2105.07464·cs.CL·September 2, 2021·5 cites

Few-NERD: A Few-Shot Named Entity Recognition Dataset

Ning Ding, Guangwei Xu, Yulin Chen, Xiaobin Wang, Xu Han, Pengjun Xie,, Hai-Tao Zheng, Zhiyuan Liu

PDF

Open Access 5 Repos 1 Models 1 Datasets

TL;DR

Few-NERD introduces the first large-scale, hierarchical, human-annotated few-shot NER dataset, highlighting the challenges of recognizing fine-grained entity types in a few-shot setting and providing a comprehensive benchmark for future research.

Contribution

This paper presents Few-NERD, the largest human-crafted few-shot NER dataset with hierarchical entity types, enabling more realistic and challenging evaluation of NER models.

Findings

01

Few-NERD is more challenging than existing datasets.

02

Models struggle with fine-grained entity recognition in few-shot scenarios.

03

The dataset reveals the need for improved few-shot NER methods.

Abstract

Recently, considerable literature has grown up around the theme of few-shot named entity recognition (NER), but little published benchmark data specifically focused on the practical and challenging task. Current approaches collect existing supervised NER datasets and re-organize them to the few-shot setting for empirical study. These strategies conventionally aim to recognize coarse-grained entity types with few examples, while in practice, most unseen entity types are fine-grained. In this paper, we present Few-NERD, a large-scale human-annotated few-shot NER dataset with a hierarchy of 8 coarse-grained and 66 fine-grained entity types. Few-NERD consists of 188,238 sentences from Wikipedia, 4,601,160 words are included and each is annotated as context or a part of a two-level entity type. To the best of our knowledge, this is the first few-shot NER dataset and the largest human-crafted…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
Gepe55o/mountain-ner-bert-base
model· 2 dl
2 dl

Datasets

Gepe55o/mountain-ner-dataset
dataset· 10 dl
10 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies