Survey on Publicly Available Sinhala Natural Language Processing Tools and Research

Nisansa de Silva

arXiv:1906.02358·cs.CL·January 13, 2026·23 cites

Survey on Publicly Available Sinhala Natural Language Processing Tools and Research

Nisansa de Silva

PDF

Open Access 1 Repo

TL;DR

This paper provides a comprehensive survey of publicly available Sinhala natural language processing tools and research, highlighting the current state, challenges, and the need for better coordination among researchers.

Contribution

It offers the first extensive literature review of Sinhala NLP tools and research, aiming to facilitate better resource sharing and collaboration among researchers.

Findings

01

Identifies key Sinhala NLP tools and research efforts

02

Highlights the resource scarcity and research gaps

03

Proposes ongoing updates to the survey for future developments

Abstract

Sinhala is the native language of the Sinhalese people who make up the largest ethnic group of Sri Lanka. The language belongs to the globe-spanning language tree, Indo-European. However, due to poverty in both linguistic and economic capital, Sinhala, in the perspective of Natural Language Processing tools and research, remains a resource-poor language which has neither the economic drive its cousin English has nor the sheer push of the law of numbers a language such as Chinese has. A number of research groups from Sri Lanka have noticed this dearth and the resultant dire need for proper tools and research for Sinhala natural language processing. However, due to various reasons, these attempts seem to lack coordination and awareness of each other. The objective of this paper is to fill that gap of a comprehensive literature survey of the publicly available Sinhala natural language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lknlp/lknlp.github.io
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications