Connecting the Persian-speaking World through Transliteration
Rayyan Merchant, Akhilesh Kakolu Ramarao, Kevin Tang

TL;DR
This paper introduces a transformer-based approach for transliterating between Tajik Cyrillic and Persian Arabic scripts, addressing a crucial language accessibility issue for Tajik speakers on the Internet.
Contribution
It presents a novel G2P transliteration model with benchmark scores, highlighting the task's complexity and providing insights into script differences for future research.
Findings
Achieved chrF++ scores of 58.70 and 74.20 for Farsi to Tajik and vice versa.
Demonstrated the non-trivial difficulty of Tajik-Farsi transliteration.
Provided an overview of script differences and challenges.
Abstract
Despite speaking mutually intelligible varieties of the same language, speakers of Tajik Persian, written in a modified Cyrillic alphabet, cannot read Iranian and Afghan texts written in the Perso-Arabic script. As the vast majority of Persian text on the Internet is written in Perso-Arabic, monolingual Tajik speakers are unable to interface with the Internet in any meaningful way. Due to overwhelming similarity between the formal registers of these dialects and the scarcity of Tajik-Farsi parallel data, machine transliteration has been proposed as more a practical and appropriate solution than machine translation. This paper presents a transformer-based G2P approach to Tajik-Farsi transliteration, achieving chrF++ scores of 58.70 (Farsi to Tajik) and 74.20 (Tajik to Farsi) on novel digraphic datasets, setting a comparable baseline metric for future work. Our results also demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Algorithms and Data Compression · Language and cultural evolution
