Tagging and Morphological Disambiguation of Turkish Text
Kemal Oflazer(Bilkent University, Ankara, Turkey), Ilker Kuruoz, (Bilkent University, Ankara, Turkey)

TL;DR
This paper presents a Turkish POS tagger utilizing a comprehensive morphological model, achieving high accuracy and significantly reducing parsing ambiguity and time, with potential applicability to other languages.
Contribution
It introduces a novel Turkish POS tagger based on a detailed morphological specification and disambiguation approach, improving tagging accuracy and parsing efficiency.
Findings
Achieves 98-99% tagging accuracy
Reduces parsing ambiguity by 50%
Speeds up parsing by 2.5 times
Abstract
Automatic text tagging is an important component in higher level analysis of text corpora, and its output can be used in many natural language processing applications. In languages like Turkish or Finnish, with agglutinative morphology, morphological disambiguation is a very crucial process in tagging, as the structures of many lexical forms are morphologically ambiguous. This paper describes a POS tagger for Turkish text based on a full-scale two-level specification of Turkish morphology that is based on a lexicon of about 24,000 root words. This is augmented with a multi-word and idiomatic construct recognizer, and most importantly morphological disambiguator based on local neighborhood constraints, heuristics and limited amount of statistical information. The tagger also has functionality for statistics compilation and fine tuning of the morphological analyzer, such as logging…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text and Document Classification Technologies
