Exploring the Statistical Derivation of Transformational Rule Sequences   for Part-of-Speech Tagging

Lance A. Ramshaw (Univ. of Pennsylvania; Bowdoin College); and; Mitchell P. Marcus (Univ. of Pennsylvania)

arXiv:cmp-lg/9406011·cmp-lg·February 3, 2008·22 cites

Exploring the Statistical Derivation of Transformational Rule Sequences for Part-of-Speech Tagging

Lance A. Ramshaw (Univ. of Pennsylvania, Bowdoin College), and, Mitchell P. Marcus (Univ. of Pennsylvania)

PDF

Open Access

TL;DR

This paper analyzes Brill's corpus-based transformational rule learning method for part-of-speech tagging, highlighting its resistance to overtraining and providing insights into its statistical derivation and implementation.

Contribution

It offers a detailed analysis of Brill's approach as a variation of decision tree methods, including a fast implementation and dependency recording mechanism.

Findings

01

Resistant to overtraining in POS tagging tasks

02

Effective for English and ancient Greek corpora

03

Provides a fast, incremental learning algorithm

Abstract

Eric Brill has recently proposed a simple and powerful corpus-based language modeling approach that can be applied to various tasks including part-of-speech tagging and building phrase structure trees. The method learns a series of symbolic transformational rules, which can then be applied in sequence to a test corpus to produce predictions. The learning process only requires counting matches for a given set of rule templates, allowing the method to survey a very large space of possible contextual factors. This paper analyses Brill's approach as an interesting variation on existing decision tree methods, based on experiments involving part-of-speech tagging for both English and ancient Greek corpora. In particular, the analysis throws light on why the new mechanism seems surprisingly resistant to overtraining. A fast, incremental implementation and a mechanism for recording the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems · Speech Recognition and Synthesis