Noisy-Labeled NER with Confidence Estimation
Kun Liu, Yao Fu, Chuanqi Tan, Mosha Chen, Ningyu Zhang, Songfang, Huang, Sheng Gao

TL;DR
This paper addresses the challenge of noisy labels in named entity recognition by developing confidence estimation strategies, a label marginalization technique, and a calibration method, improving performance in noisy and distant labeling scenarios.
Contribution
It introduces confidence estimation, label marginalization, and calibration techniques for noisy NER data, enhancing robustness and accuracy in real-world noisy annotation settings.
Findings
Effective confidence estimation improves NER performance.
Label marginalization reduces noise impact.
Method works across four languages and distant labels.
Abstract
Recent studies in deep learning have shown significant progress in named entity recognition (NER). Most existing works assume clean data annotation, yet a fundamental challenge in real-world scenarios is the large amount of noise from a variety of sources (e.g., pseudo, weak, or distant annotations). This work studies NER under a noisy labeled setting with calibrated confidence estimation. Based on empirical observations of different training dynamics of noisy and clean labels, we propose strategies for estimating confidence scores based on local and global independence assumptions. We partially marginalize out labels of low confidence with a CRF model. We further propose a calibration method for confidence scores based on the structure of entity labels. We integrate our approach into a self-training framework for boosting performance. Experiments in general noisy settings with four…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Data Quality and Management · Natural Language Processing Techniques
MethodsConditional Random Field
