Creating a morphological and syntactic tagged corpus for the Uzbek   language

Maksud Sharipov; Jamolbek Mattiev; Jasur Sobirov; Rustam Baltayev

arXiv:2210.15234·cs.CL·October 28, 2022·6 cites

Creating a morphological and syntactic tagged corpus for the Uzbek language

Maksud Sharipov, Jamolbek Mattiev, Jasur Sobirov, Rustam Baltayev

PDF

Open Access

TL;DR

This paper presents the development of a new POS and syntactic tagset for Uzbek, along with a web-based annotation tool, to create a tagged corpus for NLP applications in this low-resource language.

Contribution

It introduces a novel tagset and a web-based annotation platform specifically designed for Uzbek language processing.

Findings

01

First stage of the Uzbek tagged corpus completed

02

Web-based annotation tool successfully implemented

03

Enhanced resources for Uzbek NLP development

Abstract

Nowadays, creation of the tagged corpora is becoming one of the most important tasks of Natural Language Processing (NLP). There are not enough tagged corpora to build machine learning models for the low-resource Uzbek language. In this paper, we tried to fill that gap by developing a novel Part Of Speech (POS) and syntactic tagset for creating the syntactic and morphologically tagged corpus of the Uzbek language. This work also includes detailed description and presentation of a web-based application to work on a tagging as well. Based on the developed annotation tool and the software, we share our experience results of the first stage of the tagged corpus creation

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques