A Survey of Resources and Methods for Natural Language Processing of Serbian Language
Ulfeta A. Marovac, Aldina R. Avdi\'c, Nikola Lj. Milo\v{s}evi\'c

TL;DR
This survey reviews the development of resources and methods for Serbian natural language processing, highlighting challenges due to its high inflectionality and low resource availability over the past three decades.
Contribution
It provides a comprehensive overview of existing initiatives, resources, and methods for Serbian NLP, emphasizing the progress and current state of the field.
Findings
Multiple corpora and annotated datasets have been developed for Serbian NLP.
Various methods and models have been applied to tasks like classification and named entity recognition.
Resources remain limited, posing ongoing challenges for Serbian language processing.
Abstract
The Serbian language is a Slavic language spoken by over 12 million speakers and well understood by over 15 million people. In the area of natural language processing, it can be considered a low-resourced language. Also, Serbian is considered a high-inflectional language. The combination of many word inflections and low availability of language resources makes natural language processing of Serbian challenging. Nevertheless, over the past three decades, there have been a number of initiatives to develop resources and methods for natural language processing of Serbian, ranging from developing a corpus of free text from books and the internet, annotated corpora for classification and named entity recognition tasks to various methods and models performing these tasks. In this paper, we review the initiatives, resources, methods, and their availability.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
