ESNERA: Empirical and semantic named entity alignment for named entity dataset merging
Xiaobo Zhang (1, 2), Congqing He (2), Ying He (1, 2), Jian Peng (1), Dajie Fu (1), Tien-Ping Tan (2) ((1) School of Information Engineering, Jiangxi Vocational College of Finance & Economics, Jiujiang, China, (2) School of Computer Sciences, Universiti Sains Malaysia, Penang

TL;DR
This paper introduces an automatic label alignment method combining empirical and semantic similarities to merge NER datasets effectively, improving performance especially in low-resource domains.
Contribution
It proposes a novel, scalable, and interpretable label alignment approach for merging NER datasets using a greedy pairwise strategy based on label similarity.
Findings
Successfully merged multiple NER datasets with minimal performance loss.
Enhanced NER performance in the financial domain with limited data.
Demonstrated scalability and interpretability of the proposed method.
Abstract
Named Entity Recognition (NER) is a fundamental task in natural language processing. It remains a research hotspot due to its wide applicability across domains. Although recent advances in deep learning have significantly improved NER performance, they rely heavily on large, high-quality annotated datasets. However, building these datasets is expensive and time-consuming, posing a major bottleneck for further research. Current dataset merging approaches mainly focus on strategies like manual label mapping or constructing label graphs, which lack interpretability and scalability. To address this, we propose an automatic label alignment method based on label similarity. The method combines empirical and semantic similarities, using a greedy pairwise merging strategy to unify label spaces across different datasets. Experiments are conducted in two stages: first, merging three existing NER…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Graph Neural Networks · Text and Document Classification Technologies
