Comparing a Linguistic and a Stochastic Tagger

Christer Samuelsson (Lucent Technologies); Atro Voutilainen; (University of Helsinki)

arXiv:cmp-lg/9706005·cmp-lg·February 3, 2008·27 cites

Comparing a Linguistic and a Stochastic Tagger

Christer Samuelsson (Lucent Technologies), Atro Voutilainen, (University of Helsinki)

PDF

Open Access

TL;DR

This paper compares a rule-based morphological tagger with a statistical tagger on a disambiguation task, showing the rule-based approach achieves significantly lower error rates, and discusses issues affecting evaluation accuracy.

Contribution

It provides a direct comparison between a linguistic rule-based tagger and a stochastic statistical tagger using a common dataset and evaluation framework.

Findings

01

Rule-based tagger has an order of magnitude lower error rate.

02

Priming effects and annotator disagreement impact results.

03

Statistical tagger performs worse at the same ambiguity level.

Abstract

Concerning different approaches to automatic PoS tagging: EngCG-2, a constraint-based morphological tagger, is compared in a double-blind test with a state-of-the-art statistical tagger on a common disambiguation task using a common tag set. The experiments show that for the same amount of remaining ambiguity, the error rate of the statistical tagger is one order of magnitude greater than that of the rule-based one. The two related issues of priming effects compromising the results and disagreement between human annotators are also addressed.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Algorithms and Data Compression · Topic Modeling