Reconstructing Syllable Sequences in Abugida Scripts with Incomplete Inputs
Ye Kyaw Thu, Thazin Myint Oo

TL;DR
This study employs Transformer models to predict and reconstruct complete syllable sequences in Abugida scripts from incomplete inputs, highlighting the importance of consonant information and demonstrating robust performance across multiple languages and input types.
Contribution
It introduces a Transformer-based approach for reconstructing syllable sequences in Abugida languages from various incomplete inputs, advancing sequence prediction methods for these scripts.
Findings
Consonant sequences are crucial for accurate syllable prediction.
The model performs well with partial and masked syllable inputs.
Vowel sequences are more challenging to predict accurately.
Abstract
This paper explores syllable sequence prediction in Abugida languages using Transformer-based models, focusing on six languages: Bengali, Hindi, Khmer, Lao, Myanmar, and Thai, from the Asian Language Treebank (ALT) dataset. We investigate the reconstruction of complete syllable sequences from various incomplete input types, including consonant sequences, vowel sequences, partial syllables (with random character deletions), and masked syllables (with fixed syllable deletions). Our experiments reveal that consonant sequences play a critical role in accurate syllable prediction, achieving high BLEU scores, while vowel sequences present a significantly greater challenge. The model demonstrates robust performance across tasks, particularly in handling partial and masked syllable reconstruction, with strong results for tasks involving consonant information and syllable masking. This study…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Authorship Attribution and Profiling
