MILU: A Multi-task Indic Language Understanding Benchmark
Sshubam Verma, Mohammed Safi Ur Rahman Khan, Vishwajeet Kumar, Rudra, Murthy, Jaydeep Sen

TL;DR
MILU is a comprehensive benchmark designed to evaluate large language models on 11 Indic languages across diverse domains, highlighting current models' struggles with culturally specific knowledge and low-resource languages.
Contribution
Introduces MILU, the first extensive multi-task benchmark for Indic languages, covering 8 domains and 41 subjects, to assess LLMs' cultural and linguistic understanding.
Findings
Open multilingual models outperform language-specific fine-tuned models.
Current LLMs struggle with culturally relevant areas like Arts and Humanities.
Models perform better in high-resource languages than in low-resource ones.
Abstract
Evaluating Large Language Models (LLMs) in low-resource and linguistically diverse languages remains a significant challenge in NLP, particularly for languages using non-Latin scripts like those spoken in India. Existing benchmarks predominantly focus on English, leaving substantial gaps in assessing LLM capabilities in these languages. We introduce MILU, a Multi task Indic Language Understanding Benchmark, a comprehensive evaluation benchmark designed to address this gap. MILU spans 8 domains and 41 subjects across 11 Indic languages, reflecting both general and culturally specific knowledge. With an India-centric design, incorporates material from regional and state-level examinations, covering topics such as local history, arts, festivals, and laws, alongside standard subjects like science and mathematics. We evaluate over 42 LLMs, and find that current LLMs struggle with MILU, with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
MethodsFocus
