NanoNER: Named Entity Recognition for nanobiology using experts' knowledge and distant supervision
Martin Lentschat (SIGMA, GETALP), Cyril Labb\'e (LIG, SIGMA), Ran, Cheng (LIG, SIGMA)

TL;DR
NanoNER is a high-accuracy NER model for nanobiology that leverages domain experts' knowledge and distant supervision to identify known and novel entities with minimal manual effort, demonstrating strong results on a large corpus.
Contribution
The paper introduces NanoNER, a novel NER approach combining expert-annotated ontologies and distant supervision, enabling efficient recognition of entities in nanobiology texts.
Findings
F1-score of 0.98 on known entities
Achieved 77-81% precision on new entities
Discovered up to 30% of ablated terms
Abstract
Here we present the training and evaluation of NanoNER, a Named Entity Recognition (NER) model for Nanobiology. NER consists in the identification of specific entities in spans of unstructured texts and is often a primary task in Natural Language Processing (NLP) and Information Extraction. The aim of our model is to recognise entities previously identified by domain experts as constituting the essential knowledge of the domain. Relying on ontologies, which provide us with a domain vocabulary and taxonomy, we implemented an iterative process enabling experts to determine the entities relevant to the domain at hand. We then delve into the potential of distant supervision learning in NER, supporting how this method can increase the quantity of annotated data with minimal additional manpower. On our full corpus of 728 full-text nanobiology articles, containing more than 120k entity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Data Quality and Management · Biomedical Text Mining and Ontologies
