On the Evaluation and Comparison of Taggers: The Effect of Noise in   Testing Corpora

L. Padro & L. Marquez (Universitat Politecnica de Catalunya)

arXiv:cs/9809112·cs.CL·May 23, 2007·5 cites

On the Evaluation and Comparison of Taggers: The Effect of Noise in Testing Corpora

L. Padro & L. Marquez (Universitat Politecnica de Catalunya)

PDF

Open Access

TL;DR

This paper investigates how noise in test corpora affects the evaluation of POS taggers, emphasizing the need for more rigorous testing methods to ensure accurate comparisons.

Contribution

It highlights the impact of corpus noise on tagger evaluation and advocates for improved experimental designs for reliable performance assessment.

Findings

01

Noise in test corpora distorts tagger performance measures

02

Current evaluation practices may lead to invalid comparisons

03

Rigorous testing protocols are necessary for accurate evaluation

Abstract

This paper addresses the issue of {\sc pos} tagger evaluation. Such evaluation is usually performed by comparing the tagger output with a reference test corpus, which is assumed to be error-free. Currently used corpora contain noise which causes the obtained performance to be a distortion of the real value. We analyze to what extent this distortion may invalidate the comparison between taggers or the measure of the improvement given by a new system. The main conclusion is that a more rigorous testing experimentation setting/designing is needed to reliably evaluate and compare tagger accuracies.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Algorithms and Data Compression