Learning Nested Named Entity Recognition from Flat Annotations
Igor Rozhkov, Natalia Loukachevitch

TL;DR
This paper explores methods for training nested named entity recognition models using only flat annotations, aiming to reduce the need for costly multi-level labeling.
Contribution
It introduces four approaches to learn nested structures from flat data and demonstrates their effectiveness on a Russian NER benchmark.
Findings
Best method achieves 26.37% inner F1 score
Closes 40% of the gap to fully supervised nested NER
Provides code for reproducibility
Abstract
Nested named entity recognition identifies entities contained within other entities, but requires expensive multi-level annotation. While flat NER corpora exist abundantly, nested resources remain scarce. We investigate whether models can learn nested structure from flat annotations alone, evaluating four approaches: string inclusions (substring matching), entity corruption (pseudo-nested data), flat neutralization (reducing false negative signal), and a hybrid fine-tuned + LLM pipeline. On NEREL, a Russian benchmark with 29 entity types where 21% of entities are nested, our best combined method achieves 26.37% inner F1, closing 40% of the gap to full nested supervision. Code is available at https://github.com/fulstock/Learning-from-Flat-Annotations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Data Quality and Management · Advanced Graph Neural Networks
