Swa-bhasha Resource Hub: Romanized Sinhala to Sinhala Transliteration Systems and Data Resources
Deshan Sumanathilaka, Sameera Perera, Sachithya Dharmasiri, Maneesha Athukorala, Anuja Dilrukshi Herath, Rukshan Dias, Pasindu Gamage, Ruvan Weerasinghe, Y.H.P.P. Priyadarshana

TL;DR
The paper introduces the Swa-bhasha Resource Hub, a comprehensive collection of data and tools for Romanized Sinhala to Sinhala transliteration, supporting NLP research and applications.
Contribution
It provides publicly accessible datasets and algorithms for Sinhala transliteration, along with a comparative analysis of existing transliteration applications.
Findings
Resources significantly aid Sinhala NLP research.
Open datasets and tools are publicly available.
Comparative analysis of transliteration applications included.
Abstract
The Swa-bhasha Resource Hub provides a comprehensive collection of data resources and algorithms developed for Romanized Sinhala to Sinhala transliteration between 2020 and 2025. These resources have played a significant role in advancing research in Sinhala Natural Language Processing (NLP), particularly in training transliteration models and developing applications involving Romanized Sinhala. The current openly accessible data sets and corresponding tools are made publicly available through this hub. This paper presents a detailed overview of the resources contributed by the authors and includes a comparative analysis of existing transliteration applications in the domain.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗deshanksuman/mbart_50_SinhalaTransliterationmodel· 4 dl4 dl
- 🤗deshanksuman/romanized-sinhala-tokenizermodel
- 🤗deshanksuman/swabhashambart50SinhalaTransliterationmodel· 7 dl· ♡ 17 dl♡ 1
- 🤗savinugunarathna/ByT5-Small-fine-tunedmodel· 4 dl4 dl
- 🤗savinugunarathna/singlish-to-sinhala-mt5-smallmodel· 6 dl6 dl
- 🤗savinugunarathna/ByT5-Small-fine-tuned2model· 11 dl11 dl
- 🤗savinugunarathna/Gemma3-Singlish-Sinhala-Mergedmodel· 17 dl17 dl
- 🤗savinugunarathna/Gemma3-Singlish-Sinhala-CodeMixmodel· 35 dl35 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
