Comparing a Linguistic and a Stochastic Tagger
Christer Samuelsson (Lucent Technologies), Atro Voutilainen, (University of Helsinki)

TL;DR
This paper compares a rule-based morphological tagger with a statistical tagger on a disambiguation task, showing the rule-based approach achieves significantly lower error rates, and discusses issues affecting evaluation accuracy.
Contribution
It provides a direct comparison between a linguistic rule-based tagger and a stochastic statistical tagger using a common dataset and evaluation framework.
Findings
Rule-based tagger has an order of magnitude lower error rate.
Priming effects and annotator disagreement impact results.
Statistical tagger performs worse at the same ambiguity level.
Abstract
Concerning different approaches to automatic PoS tagging: EngCG-2, a constraint-based morphological tagger, is compared in a double-blind test with a state-of-the-art statistical tagger on a common disambiguation task using a common tag set. The experiments show that for the same amount of remaining ambiguity, the error rate of the statistical tagger is one order of magnitude greater than that of the rule-based one. The two related issues of priming effects compromising the results and disagreement between human annotators are also addressed.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Algorithms and Data Compression · Topic Modeling
