IndicGenBench: A Multilingual Benchmark to Evaluate Generation   Capabilities of LLMs on Indic Languages

Harman Singh; Nitish Gupta; Shikhar Bharadwaj; Dinesh Tewari; Partha; Talukdar

arXiv:2404.16816·cs.CL·August 9, 2024

IndicGenBench: A Multilingual Benchmark to Evaluate Generation Capabilities of LLMs on Indic Languages

Harman Singh, Nitish Gupta, Shikhar Bharadwaj, Dinesh Tewari, Partha, Talukdar

PDF

Open Access 1 Repo 10 Models 5 Datasets 1 Video

TL;DR

IndicGenBench is a comprehensive multilingual benchmark designed to evaluate large language models on generation tasks across 29 Indic languages, highlighting performance gaps and the need for more inclusive models.

Contribution

The paper introduces IndicGenBench, the largest benchmark for Indic languages, with diverse tasks and human-curated data, enabling evaluation of LLMs on under-represented languages for the first time.

Findings

01

PaLM-2 performs best among evaluated models.

02

Significant performance gap exists between Indic languages and English.

03

Further research needed for inclusive multilingual models.

Abstract

As large language models (LLMs) see increasing adoption across the globe, it is imperative for LLMs to be representative of the linguistic diversity of the world. India is a linguistically diverse country of 1.4 Billion people. To facilitate research on multilingual LLM evaluation, we release IndicGenBench - the largest benchmark for evaluating LLMs on user-facing generation tasks across a diverse set 29 of Indic languages covering 13 scripts and 4 language families. IndicGenBench is composed of diverse generation tasks like cross-lingual summarization, machine translation, and cross-lingual question answering. IndicGenBench extends existing benchmarks to many Indic languages through human curation providing multi-way parallel evaluation data for many under-represented Indic languages for the first time. We evaluate a wide range of proprietary and open-source LLMs including GPT-3.5,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google-research-datasets/indic-gen-bench
noneOfficial

Models

Datasets

Videos

IndicGenBench: A Multilingual Benchmark to Evaluate Generation Capabilities of LLMs on Indic Languages· underline

Taxonomy

TopicsNatural Language Processing Techniques · Translation Studies and Practices

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Sparse Evolutionary Training · Adafactor · Position-Wise Feed-Forward Layer · SentencePiece · Inverse Square Root Schedule · Absolute Position Encodings · Linear Layer