Regularization for Long Named Entity Recognition

Minbyul Jeong; Jaewoo Kang

arXiv:2104.07249·cs.CL·January 12, 2022

Regularization for Long Named Entity Recognition

Minbyul Jeong, Jaewoo Kang

PDF

Open Access 1 Repo

TL;DR

This paper introduces RegLER, a debiasing method that enhances long-named entity recognition in PLMs by addressing length bias and class imbalance, improving generalization on unseen mentions across domains.

Contribution

The paper presents a novel debiasing technique, RegLER, specifically designed to improve long entity recognition and handle class imbalance in NER tasks.

Findings

01

RegLER significantly improves long entity recognition accuracy.

02

The method reduces bias from easy-negative examples like 'The'.

03

Experiments show strong generalization across biomedical and general domains.

Abstract

When performing named entity recognition (NER), entity length is variable and dependent on a specific domain or dataset. Pre-trained language models (PLMs) are used to solve NER tasks and tend to be biased toward dataset patterns such as length statistics, surface form, and skewed class distribution. These biases hinder the generalization ability of PLMs, which is necessary to address many unseen mentions in real-world situations. We propose a novel debiasing method RegLER to improve predictions for entities of varying lengths. To close the gap between evaluation and real-world situations, we evaluated PLMs on partitioned benchmark datasets containing unseen mention sets. Here, RegLER shows significant improvement over long-named entities that can predict through debiasing on conjunction or special characters within entities. Furthermore, there is a severe class imbalance in most NER…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

minstar/PMI
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies