Regularization for Long Named Entity Recognition
Minbyul Jeong, Jaewoo Kang

TL;DR
This paper introduces RegLER, a debiasing method that enhances long-named entity recognition in PLMs by addressing length bias and class imbalance, improving generalization on unseen mentions across domains.
Contribution
The paper presents a novel debiasing technique, RegLER, specifically designed to improve long entity recognition and handle class imbalance in NER tasks.
Findings
RegLER significantly improves long entity recognition accuracy.
The method reduces bias from easy-negative examples like 'The'.
Experiments show strong generalization across biomedical and general domains.
Abstract
When performing named entity recognition (NER), entity length is variable and dependent on a specific domain or dataset. Pre-trained language models (PLMs) are used to solve NER tasks and tend to be biased toward dataset patterns such as length statistics, surface form, and skewed class distribution. These biases hinder the generalization ability of PLMs, which is necessary to address many unseen mentions in real-world situations. We propose a novel debiasing method RegLER to improve predictions for entities of varying lengths. To close the gap between evaluation and real-world situations, we evaluated PLMs on partitioned benchmark datasets containing unseen mention sets. Here, RegLER shows significant improvement over long-named entities that can predict through debiasing on conjunction or special characters within entities. Furthermore, there is a severe class imbalance in most NER…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies
