Survey on Publicly Available Sinhala Natural Language Processing Tools and Research
Nisansa de Silva

TL;DR
This paper provides a comprehensive survey of publicly available Sinhala natural language processing tools and research, highlighting the current state, challenges, and the need for better coordination among researchers.
Contribution
It offers the first extensive literature review of Sinhala NLP tools and research, aiming to facilitate better resource sharing and collaboration among researchers.
Findings
Identifies key Sinhala NLP tools and research efforts
Highlights the resource scarcity and research gaps
Proposes ongoing updates to the survey for future developments
Abstract
Sinhala is the native language of the Sinhalese people who make up the largest ethnic group of Sri Lanka. The language belongs to the globe-spanning language tree, Indo-European. However, due to poverty in both linguistic and economic capital, Sinhala, in the perspective of Natural Language Processing tools and research, remains a resource-poor language which has neither the economic drive its cousin English has nor the sheer push of the law of numbers a language such as Chinese has. A number of research groups from Sri Lanka have noticed this dearth and the resultant dire need for proper tools and research for Sinhala natural language processing. However, due to various reasons, these attempts seem to lack coordination and awareness of each other. The objective of this paper is to fill that gap of a comprehensive literature survey of the publicly available Sinhala natural language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
