Ending-based Strategies for Part-of-speech Tagging
Greg Adams, Beth Millar, Eric Neufeld, Tim Philip

TL;DR
This paper explores ending-based strategies for part-of-speech tagging, showing that using word endings can perform nearly as well as full-word methods and can even outperform them under certain conditions.
Contribution
It introduces a novel approach that prioritizes word-ending statistics over whole-word data, revealing unexpected performance patterns and achieving high accuracy.
Findings
Ending-based tagger performed nearly as well as word-based taggers.
Performance improved with larger training data but declined after a point.
Achieved a 97.5% tagging accuracy using ending strategies.
Abstract
Probabilistic approaches to part-of-speech tagging rely primarily on whole-word statistics about word/tag combinations as well as contextual information. But experience shows about 4 per cent of tokens encountered in test sets are unknown even when the training set is as large as a million words. Unseen words are tagged using secondary strategies that exploit word features such as endings, capitalizations and punctuation marks. In this work, word-ending statistics are primary and whole-word statistics are secondary. First, a tagger was trained and tested on word endings only. Subsequent experiments added back whole-word statistics for the words occurring most frequently in the training set. As grew larger, performance was expected to improve, in the limit performing the same as word-based taggers. Surprisingly, the ending-based tagger initially performed nearly as well as the word-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
