Using Multiple Sources of Information for Constraint-Based Morphological   Disambiguation

Gokhan Tur

arXiv:cmp-lg/9607030·cmp-lg·February 3, 2008·3 cites

Using Multiple Sources of Information for Constraint-Based Morphological Disambiguation

Gokhan Tur

PDF

Open Access

TL;DR

This thesis introduces a constraint-based morphological disambiguation system for complex languages like Turkish, combining handcrafted rules, learned constraints, and statistical data to achieve high accuracy and low ambiguity.

Contribution

It presents a novel multi-source approach that integrates rule-based, learned, and statistical information for morphological disambiguation in agglutinative languages.

Findings

01

Achieved 96-97% recall and 93-94% precision in disambiguation

02

Reduced unknown words to below 1% with secondary processing

03

Attained low ambiguity of about 1.02 to 1.03 parses per token

Abstract

This thesis presents a constraint-based morphological disambiguation approach that is applicable to languages with complex morphology--specifically agglutinative languages with productive inflectional and derivational morphological phenomena. For morphologically complex languages like Turkish, automatic morphological disambiguation involves selecting for each token morphological parse(s), with the right set of inflectional and derivational markers. Our system combines corpus independent hand-crafted constraint rules, constraint rules that are learned via unsupervised learning from a training corpus, and additional statistical information obtained from the corpus to be morphologically disambiguated. The hand-crafted rules are linguistically motivated and tuned to improve precision without sacrificing recall. In certain respects, our approach has been motivated by Brill's recent work, but…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Second Language Acquisition and Learning