Beyond Specialization: Benchmarking LLMs for Transliteration of Indian Languages

Gulfarogh Azam; Mohd Sadique; Saif Ali; Mohammad Nadeem; Erik Cambria; Shahab Saquib Sohail; Mohammad Sultan Alam

arXiv:2505.19851·cs.CL·May 27, 2025

Beyond Specialization: Benchmarking LLMs for Transliteration of Indian Languages

Gulfarogh Azam, Mohd Sadique, Saif Ali, Mohammad Nadeem, Erik Cambria, Shahab Saquib Sohail, Mohammad Sultan Alam

PDF

Open Access

TL;DR

This study evaluates large language models' ability to perform transliteration of Indian languages, demonstrating that general-purpose LLMs can outperform specialized models in many cases, with potential for minimal fine-tuning.

Contribution

It systematically benchmarks prominent LLMs against a specialized transliteration model across multiple Indian languages, highlighting their strengths and robustness.

Findings

01

GPT models generally outperform specialized models like IndicXlit.

02

Fine-tuning GPT-4o enhances language-specific performance.

03

LLMs show robustness under noisy conditions.

Abstract

Transliteration, the process of mapping text from one script to another, plays a crucial role in multilingual natural language processing, especially within linguistically diverse contexts such as India. Despite significant advancements through specialized models like IndicXlit, recent developments in large language models suggest a potential for general-purpose models to excel at this task without explicit task-specific training. The current work systematically evaluates the performance of prominent LLMs, including GPT-4o, GPT-4.5, GPT-4.1, Gemma-3-27B-it, and Mistral-Large against IndicXlit, a state-of-the-art transliteration model, across ten major Indian languages. Experiments utilized standard benchmarks, including Dakshina and Aksharantar datasets, with performance assessed via Top-1 Accuracy and Character Error Rate. Our findings reveal that while GPT family models generally…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Translation Studies and Practices · Library Science and Information Systems

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Label Smoothing · Cosine Annealing · Linear Layer · Layer Normalization · Byte Pair Encoding · Residual Connection