Unsupervised Learning of Word-Category Guessing Rules
Andrei Mikheev (HCRC, Edinburgh University)

TL;DR
This paper introduces an unsupervised method to automatically learn rules for guessing the parts of speech of unknown words, improving POS tagging without labeled data.
Contribution
It presents a novel unsupervised approach to induce morphological and ending-guessing rules from raw text and lexicons, advancing POS tagging techniques.
Findings
Achieved competitive performance on the Brown Corpus
Induced three types of word-guessing rules: prefix, suffix, and ending-guessing
Outperformed some existing methods in unknown word tagging
Abstract
Words unknown to the lexicon present a substantial problem to part-of-speech tagging. In this paper we present a technique for fully unsupervised statistical acquisition of rules which guess possible parts-of-speech for unknown words. Three complementary sets of word-guessing rules are induced from the lexicon and a raw corpus: prefix morphological rules, suffix morphological rules and ending-guessing rules. The learning was performed on the Brown Corpus data and rule-sets, with a highly competitive performance, were produced and compared with the state-of-the-art.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Algorithms and Data Compression
