TnT - A Statistical Part-of-Speech Tagger

Thorsten Brants (Saarland University; Germany)

arXiv:cs/0003055·cs.CL·May 23, 2007·326 cites

TnT - A Statistical Part-of-Speech Tagger

Thorsten Brants (Saarland University, Germany)

PDF

Open Access

TL;DR

TnT is an efficient statistical part-of-speech tagger based on Markov models that performs comparably or better than other approaches like Maximum Entropy, with demonstrated evaluations on multiple corpora.

Contribution

This paper introduces TnT, a Markov model-based POS tagger that challenges claims about other methods and shows competitive performance.

Findings

01

TnT performs at least as well as Maximum Entropy taggers.

02

TnT outperforms other models on tested corpora.

03

Effective smoothing and unknown word handling techniques are described.

Abstract

Trigrams'n'Tags (TnT) is an efficient statistical part-of-speech tagger. Contrary to claims found elsewhere in the literature, we argue that a tagger based on Markov models performs at least as well as other current approaches, including the Maximum Entropy framework. A recent comparison has even shown that TnT performs significantly better for the tested corpora. We describe the basic model of TnT, the techniques used for smoothing and for handling unknown words. Furthermore, we present evaluations on two corpora.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Algorithms and Data Compression · Topic Modeling