Scripts Through Time: A Survey of the Evolving Role of Transliteration in NLP
Thanmay Jayakumar, Deepon Halder, Raj Dabre

TL;DR
This survey examines how transliteration techniques have evolved and are applied in cross-lingual NLP to overcome script barriers, enhancing transfer learning and inference efficiency.
Contribution
It offers a comprehensive taxonomy, analyzes various approaches, and provides practical recommendations for implementing transliteration in modern NLP models.
Findings
Transliteration improves lexical overlap in cross-lingual NLP.
Different approaches have trade-offs in effectiveness and complexity.
Transliteration benefits include handling code-mixed text and improving inference efficiency.
Abstract
Cross-lingual transfer in NLP is often hindered by the ``script barrier'' where differences in writing systems inhibit transfer learning between languages. Transliteration, the process of converting the script, has emerged as a powerful technique to bridge this gap by increasing lexical overlap. This paper provides a comprehensive survey of the application of transliteration in cross-lingual NLP. We present a taxonomy of key motivations to utilize transliterations in language models, and provide an overview of different approaches of incorporating transliterations as input. We analyze the evolution and effectiveness of these methods, discussing the critical trade-offs involved, and contextualize their need in modern LLMs. The review explores various settings that show how transliteration is beneficial, including handling code-mixed text, leveraging language family relatedness, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
