A Survey of Resources and Methods for Natural Language Processing of   Serbian Language

Ulfeta A. Marovac; Aldina R. Avdi\'c; Nikola Lj. Milo\v{s}evi\'c

arXiv:2304.05468·cs.CL·April 13, 2023·1 cites

A Survey of Resources and Methods for Natural Language Processing of Serbian Language

Ulfeta A. Marovac, Aldina R. Avdi\'c, Nikola Lj. Milo\v{s}evi\'c

PDF

Open Access

TL;DR

This survey reviews the development of resources and methods for Serbian natural language processing, highlighting challenges due to its high inflectionality and low resource availability over the past three decades.

Contribution

It provides a comprehensive overview of existing initiatives, resources, and methods for Serbian NLP, emphasizing the progress and current state of the field.

Findings

01

Multiple corpora and annotated datasets have been developed for Serbian NLP.

02

Various methods and models have been applied to tasks like classification and named entity recognition.

03

Resources remain limited, posing ongoing challenges for Serbian language processing.

Abstract

The Serbian language is a Slavic language spoken by over 12 million speakers and well understood by over 15 million people. In the area of natural language processing, it can be considered a low-resourced language. Also, Serbian is considered a high-inflectional language. The combination of many word inflections and low availability of language resources makes natural language processing of Serbian challenging. Nevertheless, over the past three decades, there have been a number of initiatives to develop resources and methods for natural language processing of Serbian, ranging from developing a corpus of free text from books and the internet, annotated corpora for classification and named entity recognition tasks to various methods and models performing these tasks. In this paper, we review the initiatives, resources, methods, and their availability.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques