Distantly Supervised Named Entity Recognition via Confidence-Based Multi-Class Positive and Unlabeled Learning
Kang Zhou, Yuepei Li, Qi Li

TL;DR
This paper introduces a confidence-based multi-class positive and unlabeled learning approach to improve distant supervision in named entity recognition, addressing high false negative rates due to incomplete annotations.
Contribution
It proposes a novel Conf-MPU method that estimates confidence scores for tokens and trains a classifier, advancing distant supervision techniques for NER.
Findings
Conf-MPU outperforms existing DS-NER methods on benchmark datasets.
The approach effectively handles incomplete annotations in distant supervision.
Experimental results demonstrate improved accuracy in NER tasks.
Abstract
In this paper, we study the named entity recognition (NER) problem under distant supervision. Due to the incompleteness of the external dictionaries and/or knowledge bases, such distantly annotated training data usually suffer from a high false negative rate. To this end, we formulate the Distantly Supervised NER (DS-NER) problem via Multi-class Positive and Unlabeled (MPU) learning and propose a theoretically and practically novel CONFidence-based MPU (Conf-MPU) approach. To handle the incomplete annotations, Conf-MPU consists of two steps. First, a confidence score is estimated for each token of being an entity token. Then, the proposed Conf-MPU risk estimation is applied to train a multi-class classifier for the NER task. Thorough experiments on two benchmark datasets labeled by various external knowledge demonstrate the superiority of the proposed Conf-MPU over existing DS-NER…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
