NERetrieve: Dataset for Next Generation Named Entity Recognition and Retrieval
Uri Katz, Matan Vetzler, Amir DN Cohen, Yoav Goldberg

TL;DR
This paper introduces NERetrieve, a new dataset and task variants for advanced NER, including fine-grained, zero-shot, and retrieval-based recognition, to push the boundaries of current NLP capabilities.
Contribution
It presents three novel NER task variants and a large-scale dataset supporting fine-grained, zero-shot, and retrieval-oriented entity recognition, expanding beyond traditional NER approaches.
Findings
LLMs enable new NER capabilities but are not yet fully solved.
A large silver-annotated corpus of 4 million paragraphs is provided.
The proposed variants pose significant challenges for future research.
Abstract
Recognizing entities in texts is a central need in many information-seeking scenarios, and indeed, Named Entity Recognition (NER) is arguably one of the most successful examples of a widely adopted NLP task and corresponding NLP technology. Recent advances in large language models (LLMs) appear to provide effective solutions (also) for NER tasks that were traditionally handled with dedicated models, often matching or surpassing the abilities of the dedicated models. Should NER be considered a solved problem? We argue to the contrary: the capabilities provided by LLMs are not the end of NER research, but rather an exciting beginning. They allow taking NER to the next level, tackling increasingly more useful, and increasingly more challenging, variants. We present three variants of the NER task, together with a dataset to support them. The first is a move towards more fine-grained -- and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
