Romanized to Native Malayalam Script Transliteration Using an   Encoder-Decoder Framework

Bajiyo Baiju; Kavya Manohar; Leena G Pillai; Elizabeth Sherly

arXiv:2412.09957·cs.CL·December 16, 2024

Romanized to Native Malayalam Script Transliteration Using an Encoder-Decoder Framework

Bajiyo Baiju, Kavya Manohar, Leena G Pillai, Elizabeth Sherly

PDF

1 Repo 1 Models

TL;DR

This paper introduces an encoder-decoder model with attention-based Bi-LSTM to convert romanized Malayalam to native script, trained on 4.3 million transliteration pairs, achieving notable accuracy on diverse test patterns.

Contribution

The work presents a novel reverse transliteration model for Malayalam using a large dataset and an attention-based Bi-LSTM framework, addressing challenges in diverse typing patterns.

Findings

01

Achieved 7.4% CER on general typing patterns

02

Achieved 22.7% CER on adhoc typing patterns

03

Demonstrated effectiveness of the model on shared-task datasets

Abstract

In this work, we present the development of a reverse transliteration model to convert romanized Malayalam to native script using an encoder-decoder framework built with attention-based bidirectional Long Short Term Memory (Bi-LSTM) architecture. To train the model, we have used curated and combined collection of 4.3 million transliteration pairs derived from publicly available Indic language translitertion datasets, Dakshina and Aksharantar. We evaluated the model on two different test dataset provided by IndoNLP-2025-Shared-Task that contain, (1) General typing patterns and (2) Adhoc typing patterns, respectively. On the Test Set-1, we obtained a character error rate (CER) of 7.4%. However upon Test Set-2, with adhoc typing patterns, where most vowel indicators are missing, our model gave a CER of 22.7%.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vrclc-duk/ml-en-transliteration
noneOfficial

Models

🤗
vrclc/transliteration
model· 36 dl
36 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.