On the Evaluation and Comparison of Taggers: The Effect of Noise in Testing Corpora
L. Padro & L. Marquez (Universitat Politecnica de Catalunya)

TL;DR
This paper investigates how noise in test corpora affects the evaluation of POS taggers, emphasizing the need for more rigorous testing methods to ensure accurate comparisons.
Contribution
It highlights the impact of corpus noise on tagger evaluation and advocates for improved experimental designs for reliable performance assessment.
Findings
Noise in test corpora distorts tagger performance measures
Current evaluation practices may lead to invalid comparisons
Rigorous testing protocols are necessary for accurate evaluation
Abstract
This paper addresses the issue of {\sc pos} tagger evaluation. Such evaluation is usually performed by comparing the tagger output with a reference test corpus, which is assumed to be error-free. Currently used corpora contain noise which causes the obtained performance to be a distortion of the real value. We analyze to what extent this distortion may invalidate the comparison between taggers or the measure of the improvement given by a new system. The main conclusion is that a more rigorous testing experimentation setting/designing is needed to reliably evaluate and compare tagger accuracies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Algorithms and Data Compression
