Denoising Distantly Supervised Named Entity Recognition via a Hypergeometric Probabilistic Model
Wenkai Zhang, Hongyu Lin, Xianpei Han, Le Sun, Huidan Liu, Zhicheng, Wei, Nicholas Jing Yuan

TL;DR
This paper introduces Hypergeometric Learning, a novel denoising method for distantly supervised NER that models noise distribution explicitly, leading to improved model performance in high-noise scenarios.
Contribution
It proposes a hypergeometric distribution-based denoising algorithm that considers both noise distribution and confidence, enhancing the robustness of distant supervision for NER.
Findings
HGL effectively denoises weakly-labeled data
Significant performance improvements on NER tasks
Robustness in high noise rate settings
Abstract
Denoising is the essential step for distant supervision based named entity recognition. Previous denoising methods are mostly based on instance-level confidence statistics, which ignore the variety of the underlying noise distribution on different datasets and entity types. This makes them difficult to be adapted to high noise rate settings. In this paper, we propose Hypergeometric Learning (HGL), a denoising algorithm for distantly supervised NER that takes both noise distribution and instance-level confidence into consideration. Specifically, during neural network training, we naturally model the noise samples in each batch following a hypergeometric distribution parameterized by the noise-rate. Then each instance in the batch is regarded as either correct or noisy one according to its label confidence derived from previous training step, as well as the noise distribution in this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Data Quality and Management
