Language Resources and Technologies for Non-Scheduled and Endangered Indian Languages
Ritesh Kumar, Bornini Lahiri

TL;DR
This paper surveys the current state of language resources and technologies for India's non-scheduled and endangered languages, highlighting the scarcity of resources despite some recent efforts and support.
Contribution
It provides a comprehensive overview of available resources and technological developments for India's lesser-known and endangered languages, emphasizing gaps and recent progress.
Findings
Limited resources for non-scheduled languages
Some technological efforts have been made
Recent financial support has increased
Abstract
In the present paper, we will present a survey of the language resources and technologies available for the non-scheduled and endangered languages of India. While there have been different estimates from different sources about the number of languages in India, it could be assumed that there are more than 1,000 languages currently being spoken in India. However barring some of the 22 languages included in the 8th Schedule of the Indian Constitution (called the scheduled languages), there is hardly any substantial resource or technology available for the rest of the languages. Nonetheless there have been some individual attempts at developing resources and technologies for the different languages across the country. Of late, some financial support has also become available for the endangered languages. In this paper, we give a summary of the resources and technologies for those Indian…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Multilingual Education and Policy · South Asian Studies and Conflicts
