NuNER: Entity Recognition Encoder Pre-training via LLM-Annotated Data
Sergei Bogdanov, Alexandre Constantin, Timoth\'ee Bernard, Benoit, Crabb\'e, Etienne Bernard

TL;DR
NuNER is a specialized NER model pre-trained using LLM-annotated data, achieving high performance in few-shot settings and demonstrating the importance of dataset diversity for effective entity recognition.
Contribution
The paper introduces NuNER, a compact, task-specific NER encoder pre-trained with LLM-generated annotations, enhancing data efficiency and performance over similar models.
Findings
NuNER outperforms similar-sized models in few-shot NER tasks.
Dataset size and entity diversity are crucial for NuNER's success.
NuNER competes with larger LLMs in NER performance.
Abstract
Large Language Models (LLMs) have shown impressive abilities in data annotation, opening the way for new approaches to solve classic NLP problems. In this paper, we show how to use LLMs to create NuNER, a compact language representation model specialized in the Named Entity Recognition (NER) task. NuNER can be fine-tuned to solve downstream NER problems in a data-efficient way, outperforming similar-sized foundation models in the few-shot regime and competing with much larger LLMs. We find that the size and entity-type diversity of the pre-training dataset are key to achieving good performance. We view NuNER as a member of the broader family of task-specific foundation models, recently unlocked by LLMs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗numind/NuNER-v0.1model· 6.5k dl· ♡ 636.5k dl♡ 63
- 🤗numind/NuNER-multilingual-v0.1model· 6.0k dl· ♡ 706.0k dl♡ 70
- 🤗numind/NuNER-v1.0model· 18 dl· ♡ 818 dl♡ 8
- 🤗numind/NuNER-BERT-v1.0model· 4 dl4 dl
- 🤗guishe/nuner-v1_fewnerd_fine_supermodel· 15 dl15 dl
- 🤗guishe/nuner-v1_fewnerd_coarse_supermodel· 6 dl6 dl
- 🤗guishe/nuner-v1_ontonotes5model· 4 dl· ♡ 14 dl♡ 1
- 🤗guishe/nuner-v1_orgsmodel· 4.1k dl· ♡ 24.1k dl♡ 2
- 🤗numind/NuNER_Zero-spanmodel· 44 dl· ♡ 1944 dl♡ 19
- 🤗numind/NuNER-v2.0model· 5.7k dl· ♡ 435.7k dl♡ 43
Videos
Taxonomy
TopicsData Quality and Management · Network Security and Intrusion Detection · Topic Modeling
