Does Transliteration Help Multilingual Language Modeling?
Ibraheem Muhammad Moosa, Mahmud Elahi Akhter, Ashfia Binte Habib

TL;DR
This paper investigates whether transliteration improves multilingual language models' performance, especially for low-resource languages with diverse scripts, by empirically evaluating on Indic languages and analyzing representation similarity.
Contribution
It provides the first empirical evidence that transliteration benefits low-resource languages in multilingual models without harming high-resource ones.
Findings
Transliteration improves performance for low-resource languages.
Transliteration increases cross-lingual representation similarity.
The effect of transliteration is statistically significant.
Abstract
Script diversity presents a challenge to Multilingual Language Models (MLLM) by reducing lexical overlap among closely related languages. Therefore, transliterating closely related languages that use different writing scripts to a common script may improve the downstream task performance of MLLMs. We empirically measure the effect of transliteration on MLLMs in this context. We specifically focus on the Indic languages, which have the highest script diversity in the world, and we evaluate our models on the IndicGLUE benchmark. We perform the Mann-Whitney U test to rigorously verify whether the effect of transliteration is significant or not. We find that transliteration benefits the low-resource languages without negatively affecting the comparatively high-resource languages. We also measure the cross-lingual representation similarity of the models using centered kernel alignment on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Adam · Residual Connection · WordPiece · LAMB · Dense Connections · Softmax · Refunds@Expedia|||How do I get a full refund from Expedia?
