Ending-based Strategies for Part-of-speech Tagging

Greg Adams; Beth Millar; Eric Neufeld; Tim Philip

arXiv:1302.6777·cs.CL·February 28, 2013

Ending-based Strategies for Part-of-speech Tagging

Greg Adams, Beth Millar, Eric Neufeld, Tim Philip

PDF

Open Access

TL;DR

This paper explores ending-based strategies for part-of-speech tagging, showing that using word endings can perform nearly as well as full-word methods and can even outperform them under certain conditions.

Contribution

It introduces a novel approach that prioritizes word-ending statistics over whole-word data, revealing unexpected performance patterns and achieving high accuracy.

Findings

01

Ending-based tagger performed nearly as well as word-based taggers.

02

Performance improved with larger training data but declined after a point.

03

Achieved a 97.5% tagging accuracy using ending strategies.

Abstract

Probabilistic approaches to part-of-speech tagging rely primarily on whole-word statistics about word/tag combinations as well as contextual information. But experience shows about 4 per cent of tokens encountered in test sets are unknown even when the training set is as large as a million words. Unseen words are tagged using secondary strategies that exploit word features such as endings, capitalizations and punctuation marks. In this work, word-ending statistics are primary and whole-word statistics are secondary. First, a tagger was trained and tested on word endings only. Subsequent experiments added back whole-word statistics for the words occurring most frequently in the training set. As grew larger, performance was expected to improve, in the limit performing the same as word-based taggers. Surprisingly, the ending-based tagger initially performed nearly as well as the word-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems