CoNLL#: Fine-grained Error Analysis and a Corrected Test Set for CoNLL-03 English
Andrew Rueda, Elena \'Alvarez Mellado, Constantine Lignos

TL;DR
This paper analyzes the limitations of current NER models on CoNLL-03 English, introduces detailed error categorization, and presents a corrected test set to improve interpretability and future research directions.
Contribution
It provides a comprehensive error analysis of top NER models and introduces CoNLL#, a revised test set with systematic corrections for more accurate evaluation.
Findings
Identified systematic errors in the original test set.
Achieved more precise error attribution with new annotations.
Provided a corrected test set for improved benchmarking.
Abstract
Modern named entity recognition systems have steadily improved performance in the age of larger and more powerful neural models. However, over the past several years, the state-of-the-art has seemingly hit another plateau on the benchmark CoNLL-03 English dataset. In this paper, we perform a deep dive into the test outputs of the highest-performing NER models, conducting a fine-grained evaluation of their performance by introducing new document-level annotations on the test set. We go beyond F1 scores by categorizing errors in order to interpret the true state of the art for NER and guide future work. We review previous attempts at correcting the various flaws of the test set and introduce CoNLL#, a new corrected version of the test set that addresses its systematic and most prevalent errors, allowing for low-noise, interpretable error analysis.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification
MethodsSparse Evolutionary Training
