CrossWeigh: Training Named Entity Tagger from Imperfect Annotations

Zihan Wang; Jingbo Shang; Liyuan Liu; Lihao Lu; Jiacheng Liu; Jiawei; Han

arXiv:1909.01441·cs.CL·September 5, 2019·6 cites

CrossWeigh: Training Named Entity Tagger from Imperfect Annotations

Zihan Wang, Jingbo Shang, Liyuan Liu, Lihao Lu, Jiacheng Liu, Jiawei, Han

PDF

Open Access 1 Repo

TL;DR

This paper identifies label mistakes in NER datasets, corrects them for a cleaner test set, and introduces CrossWeigh, a framework that improves NER training by handling label noise through data partitioning and weighted training.

Contribution

It provides a corrected NER test set for more accurate evaluation and proposes CrossWeigh, a novel method for training NER models robust to label mistakes.

Findings

01

Corrected label mistakes in about 5.38% of test sentences.

02

CrossWeigh improves NER model performance across multiple datasets.

03

Re-evaluation with corrected labels yields more accurate model assessments.

Abstract

Everyone makes mistakes. So do human annotators when curating labels for named entity recognition (NER). Such label mistakes might hurt model training and interfere model comparison. In this study, we dive deep into one of the widely-adopted NER benchmark datasets, CoNLL03 NER. We are able to identify label mistakes in about 5.38% test sentences, which is a significant ratio considering that the state-of-the-art test F1 score is already around 93%. Therefore, we manually correct these label mistakes and form a cleaner test set. Our re-evaluation of popular models on this corrected test set leads to more accurate assessments, compared to those on the original test set. More importantly, we propose a simple yet effective framework, CrossWeigh, to handle label mistakes during NER model training. Specifically, it partitions the training data into several folds and train independent NER…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ZihanWangKi/CrossWeigh
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies