Sinhala Transliteration: A Comparative Analysis Between Rule-based and   Seq2Seq Approaches

Yomal De Mel; Kasun Wickramasinghe; Nisansa de Silva; Surangika; Ranathunga

arXiv:2501.00529·cs.CL·March 5, 2025

Sinhala Transliteration: A Comparative Analysis Between Rule-based and Seq2Seq Approaches

Yomal De Mel, Kasun Wickramasinghe, Nisansa de Silva, Surangika, Ranathunga

PDF

Open Access 1 Repo

TL;DR

This paper compares rule-based and Transformer-based sequence-to-sequence methods for Romanized Sinhala transliteration, demonstrating that the Transformer approach captures complex patterns more effectively.

Contribution

It introduces a Transformer-based neural transliteration model for Sinhala, outperforming traditional rule-based methods in capturing script patterns.

Findings

01

Transformer method captures more complex patterns

02

Transformer outperforms rule-based approach

03

Code available on GitHub

Abstract

Due to reasons of convenience and lack of tech literacy, transliteration (i.e., Romanizing native scripts instead of using localization tools) is eminently prevalent in the context of low-resource languages such as Sinhala, which have their own writing script. In this study, our focus is on Romanized Sinhala transliteration. We propose two methods to address this problem: Our baseline is a rule-based method, which is then compared against our second method where we approach the transliteration problem as a sequence-to-sequence task akin to the established Neural Machine Translation (NMT) task. For the latter, we propose a Transformer-based Encode-Decoder solution. We witnessed that the Transformer-based method could grab many ad-hoc patterns within the Romanized scripts compared to the rule-based method. The code base associated with this paper is available on GitHub -…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kasunw22/sinhala-transliterator
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Handwritten Text Recognition Techniques

MethodsFocus · Balanced Selection