NERetrieve: Dataset for Next Generation Named Entity Recognition and   Retrieval

Uri Katz; Matan Vetzler; Amir DN Cohen; Yoav Goldberg

arXiv:2310.14282·cs.CL·October 24, 2023·1 cites

NERetrieve: Dataset for Next Generation Named Entity Recognition and Retrieval

Uri Katz, Matan Vetzler, Amir DN Cohen, Yoav Goldberg

PDF

Open Access 1 Repo 1 Models 1 Datasets

TL;DR

This paper introduces NERetrieve, a new dataset and task variants for advanced NER, including fine-grained, zero-shot, and retrieval-based recognition, to push the boundaries of current NLP capabilities.

Contribution

It presents three novel NER task variants and a large-scale dataset supporting fine-grained, zero-shot, and retrieval-oriented entity recognition, expanding beyond traditional NER approaches.

Findings

01

LLMs enable new NER capabilities but are not yet fully solved.

02

A large silver-annotated corpus of 4 million paragraphs is provided.

03

The proposed variants pose significant challenges for future research.

Abstract

Recognizing entities in texts is a central need in many information-seeking scenarios, and indeed, Named Entity Recognition (NER) is arguably one of the most successful examples of a widely adopted NLP task and corresponding NLP technology. Recent advances in large language models (LLMs) appear to provide effective solutions (also) for NER tasks that were traditionally handled with dedicated models, often matching or surpassing the abilities of the dedicated models. Should NER be considered a solved problem? We argue to the contrary: the capabilities provided by LLMs are not the end of NER research, but rather an exciting beginning. They allow taking NER to the next level, tackling increasingly more useful, and increasingly more challenging, variants. We present three variants of the NER task, together with a dataset to support them. The first is a move towards more fine-grained -- and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

katzurik/neretrieve
noneOfficial

Models

🤗
Gepe55o/mountain-ner-bert-base
model· 2 dl
2 dl

Datasets

Gepe55o/mountain-ner-dataset
dataset· 10 dl
10 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies