Unsupervised Learning of Word-Category Guessing Rules

Andrei Mikheev (HCRC; Edinburgh University)

arXiv:cmp-lg/9604022·cmp-lg·February 3, 2008·6 cites

Unsupervised Learning of Word-Category Guessing Rules

Andrei Mikheev (HCRC, Edinburgh University)

PDF

Open Access

TL;DR

This paper introduces an unsupervised method to automatically learn rules for guessing the parts of speech of unknown words, improving POS tagging without labeled data.

Contribution

It presents a novel unsupervised approach to induce morphological and ending-guessing rules from raw text and lexicons, advancing POS tagging techniques.

Findings

01

Achieved competitive performance on the Brown Corpus

02

Induced three types of word-guessing rules: prefix, suffix, and ending-guessing

03

Outperformed some existing methods in unknown word tagging

Abstract

Words unknown to the lexicon present a substantial problem to part-of-speech tagging. In this paper we present a technique for fully unsupervised statistical acquisition of rules which guess possible parts-of-speech for unknown words. Three complementary sets of word-guessing rules are induced from the lexicon and a raw corpus: prefix morphological rules, suffix morphological rules and ending-guessing rules. The learning was performed on the Brown Corpus data and rule-sets, with a highly competitive performance, were produced and compared with the state-of-the-art.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Algorithms and Data Compression