From Phonemes to Meaning: Evaluating Large Language Models on Tamil
Jeyarajalingam Varsha, Menan Velayuthan, Sumirtha Karunakaran, Rasan Nivethiga, Kengatharaiyer Sarveswaran

TL;DR
This paper introduces ILAKKANAM, a Tamil-specific benchmark to evaluate large language models' linguistic competence, revealing performance gaps especially in complex linguistic tasks and highlighting differences between open-source and proprietary models.
Contribution
The paper presents ILAKKANAM, the first curated Tamil linguistic evaluation benchmark, and provides a comprehensive analysis of LLMs' performance on Tamil, emphasizing the need for language-specific evaluation.
Findings
Gemini 2.5 outperforms other models overall
Open-source models lag behind proprietary models
Performance declines with increasing linguistic complexity
Abstract
Large Language Models (LLMs) have shown strong generalization across tasks in high-resource languages; however, their linguistic competence in low-resource and morphologically rich languages such as Tamil remains largely unexplored. Existing multilingual benchmarks often rely on translated English datasets, failing to capture the linguistic and cultural nuances of the target language. To address this gap, we introduce ILAKKANAM, the first Tamil-specific linguistic evaluation benchmark manually curated using 820 questions from Sri Lankan school-level Tamil subject examination papers. Each question is annotated by trained linguists under five linguistic categories and a factual knowledge category, spanning Grades 1--13 to ensure broad linguistic coverage. We evaluate both closed-source and open-source LLMs using a standardized evaluation framework. Our results show that Gemini 2.5…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
