MILU: A Multi-task Indic Language Understanding Benchmark

Sshubam Verma; Mohammed Safi Ur Rahman Khan; Vishwajeet Kumar; Rudra; Murthy; Jaydeep Sen

arXiv:2411.02538·cs.CL·February 5, 2025

MILU: A Multi-task Indic Language Understanding Benchmark

Sshubam Verma, Mohammed Safi Ur Rahman Khan, Vishwajeet Kumar, Rudra, Murthy, Jaydeep Sen

PDF

Open Access 1 Repo 2 Datasets 1 Video

TL;DR

MILU is a comprehensive benchmark designed to evaluate large language models on 11 Indic languages across diverse domains, highlighting current models' struggles with culturally specific knowledge and low-resource languages.

Contribution

Introduces MILU, the first extensive multi-task benchmark for Indic languages, covering 8 domains and 41 subjects, to assess LLMs' cultural and linguistic understanding.

Findings

01

Open multilingual models outperform language-specific fine-tuned models.

02

Current LLMs struggle with culturally relevant areas like Arts and Humanities.

03

Models perform better in high-resource languages than in low-resource ones.

Abstract

Evaluating Large Language Models (LLMs) in low-resource and linguistically diverse languages remains a significant challenge in NLP, particularly for languages using non-Latin scripts like those spoken in India. Existing benchmarks predominantly focus on English, leaving substantial gaps in assessing LLM capabilities in these languages. We introduce MILU, a Multi task Indic Language Understanding Benchmark, a comprehensive evaluation benchmark designed to address this gap. MILU spans 8 domains and 41 subjects across 11 Indic languages, reflecting both general and culturally specific knowledge. With an India-centric design, incorporates material from regional and state-level examinations, covering topics such as local history, arts, festivals, and laws, alongside standard subjects like science and mathematics. We evaluate over 42 LLMs, and find that current LLMs struggle with MILU, with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AI4Bharat/MILU
noneOfficial

Datasets

Videos

MILU: A Multi-task Indic Language Understanding Benchmark· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems

MethodsFocus