Tagging the Teleman Corpus

Thorsten Brants; Christer Samuelsson (Universit"at des Saarlandes,; Computational Linguistics; Saarbr"ucken; Germany)

arXiv:cmp-lg/9505026·cmp-lg·February 3, 2008·3 cites

Tagging the Teleman Corpus

Thorsten Brants, Christer Samuelsson (Universit"at des Saarlandes,, Computational Linguistics, Saarbr"ucken, Germany)

PDF

Open Access

TL;DR

This paper compares the difficulty of tagging the Swedish Teleman corpus with English Susanne corpus using HMM-based and novel taggers, finding Teleman tagging more challenging with similar tagger performance.

Contribution

It introduces a novel reductionistic statistical tagger and compares its performance with HMM-based taggers on two different language corpora.

Findings

01

Tagging the Teleman corpus is more difficult than Susanne.

02

Both taggers perform similarly across corpora.

03

The novel tagger is competitive with HMM-based methods.

Abstract

Experiments were carried out comparing the Swedish Teleman and the English Susanne corpora using an HMM-based and a novel reductionistic statistical part-of-speech tagger. They indicate that tagging the Teleman corpus is the more difficult task, and that the performance of the two different taggers is comparable.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Authorship Attribution and Profiling · Language, Linguistics, Cultural Analysis