Part of Speech Tagging (POST) of a Low-resource Language using another Language (Developing a POS-Tagged Lexicon for Kurdish (Sorani) using a Tagged Persian (Farsi) Corpus)
Hossein Hassani

TL;DR
This paper proposes using a Persian (Farsi) tagged corpus to develop a POS-tagged lexicon for Kurdish (Sorani), addressing the lack of annotated resources for Kurdish by leveraging a closely related language.
Contribution
It introduces a novel approach of utilizing a resource from a related language to enrich Kurdish POS-tagging resources, facilitating automated tagging and lexicon development.
Findings
Partial dataset of POS-tagged Kurdish lexicon available for non-commercial use
Approach shows potential for developing Kurdish POS-tagged corpora
Resource can aid in automated Kurdish corpus annotation
Abstract
Tagged corpora play a crucial role in a wide range of Natural Language Processing. The Part of Speech Tagging (POST) is essential in developing tagged corpora. It is time-and-effort-consuming and costly, and therefore, it could be more affordable if it is automated. The Kurdish language currently lacks publicly available tagged corpora of proper sizes. Tagging the publicly available Kurdish corpora can leverage the capability of those resources to a higher level than what raw or segmented corpora can provide. Developing POS-tagged lexicons can assist the mentioned task. We use a tagged corpus (Bijankhan corpus) in Persian (Farsi) as a close language to Kurdish to develop a POS-tagged lexicon. This paper presents the approach of leveraging the resource of a close language to Kurdish to enrich its resources. A partial dataset of the results is publicly available for non-commercial use…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Linguistics and Cultural Studies
