From Phonemes to Meaning: Evaluating Large Language Models on Tamil

Jeyarajalingam Varsha; Menan Velayuthan; Sumirtha Karunakaran; Rasan Nivethiga; Kengatharaiyer Sarveswaran

arXiv:2511.12387·cs.CL·November 18, 2025

From Phonemes to Meaning: Evaluating Large Language Models on Tamil

Jeyarajalingam Varsha, Menan Velayuthan, Sumirtha Karunakaran, Rasan Nivethiga, Kengatharaiyer Sarveswaran

PDF

Open Access

TL;DR

This paper introduces ILAKKANAM, a Tamil-specific benchmark to evaluate large language models' linguistic competence, revealing performance gaps especially in complex linguistic tasks and highlighting differences between open-source and proprietary models.

Contribution

The paper presents ILAKKANAM, the first curated Tamil linguistic evaluation benchmark, and provides a comprehensive analysis of LLMs' performance on Tamil, emphasizing the need for language-specific evaluation.

Findings

01

Gemini 2.5 outperforms other models overall

02

Open-source models lag behind proprietary models

03

Performance declines with increasing linguistic complexity

Abstract

Large Language Models (LLMs) have shown strong generalization across tasks in high-resource languages; however, their linguistic competence in low-resource and morphologically rich languages such as Tamil remains largely unexplored. Existing multilingual benchmarks often rely on translated English datasets, failing to capture the linguistic and cultural nuances of the target language. To address this gap, we introduce ILAKKANAM, the first Tamil-specific linguistic evaluation benchmark manually curated using 820 questions from Sri Lankan school-level Tamil subject examination papers. Each question is annotated by trained linguists under five linguistic categories and a factual knowledge category, spanning Grades 1--13 to ensure broad linguistic coverage. We evaluate both closed-source and open-source LLMs using a standardized evaluation framework. Our results show that Gemini 2.5…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification