Tagging the Teleman Corpus
Thorsten Brants, Christer Samuelsson (Universit"at des Saarlandes,, Computational Linguistics, Saarbr"ucken, Germany)

TL;DR
This paper compares the difficulty of tagging the Swedish Teleman corpus with English Susanne corpus using HMM-based and novel taggers, finding Teleman tagging more challenging with similar tagger performance.
Contribution
It introduces a novel reductionistic statistical tagger and compares its performance with HMM-based taggers on two different language corpora.
Findings
Tagging the Teleman corpus is more difficult than Susanne.
Both taggers perform similarly across corpora.
The novel tagger is competitive with HMM-based methods.
Abstract
Experiments were carried out comparing the Swedish Teleman and the English Susanne corpora using an HMM-based and a novel reductionistic statistical part-of-speech tagger. They indicate that tagging the Teleman corpus is the more difficult task, and that the performance of the two different taggers is comparable.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Authorship Attribution and Profiling · Language, Linguistics, Cultural Analysis
