Some Advances in Transformation-Based Part of Speech Tagging

Eric Brill (MIT)

arXiv:cmp-lg/9406010·cmp-lg·February 3, 2008·279 cites

Some Advances in Transformation-Based Part of Speech Tagging

Eric Brill (MIT)

PDF

Open Access

TL;DR

This paper presents enhancements to a rule-based part of speech tagger, including lexical relation expression, unknown word handling, and a k-best tagging extension, offering an alternative to stochastic methods.

Contribution

It introduces new methods for lexical relation expression, unknown word tagging, and multi-tag assignment in rule-based POS tagging, improving linguistic interpretability.

Findings

01

Effective lexical relation modeling beyond stochastic methods

02

Successful unknown word tagging approach

03

Extension to k-best tagging for uncertain cases

Abstract

Most recent research in trainable part of speech taggers has explored stochastic tagging. While these taggers obtain high accuracy, linguistic information is captured indirectly, typically in tens of thousands of lexical and contextual probabilities. In [Brill92], a trainable rule-based tagger was described that obtained performance comparable to that of stochastic taggers, but captured relevant linguistic information in a small number of simple non-stochastic rules. In this paper, we describe a number of extensions to this rule-based tagger. First, we describe a method for expressing lexical relations in tagging that are not captured by stochastic taggers. Next, we show a rule-based approach to tagging unknown words. Finally, we show how the tagger can be extended into a k-best tagger, where multiple tags can be assigned to words in some cases of uncertainty.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems