All Languages Matter: Evaluating LMMs on Culturally Diverse 100   Languages

Ashmal Vayani; Dinura Dissanayake; Hasindri Watawana; Noor Ahsan,; Nevasini Sasikumar; Omkar Thawakar; Henok Biadglign Ademtew; Yahya Hmaiti,; Amandeep Kumar; Kartik Kuckreja; Mykola Maslych; Wafa Al Ghallabi; Mihail; Mihaylov; Chao Qin; Abdelrahman M Shaker; Mike Zhang; Mahardika Krisna; Ihsani; Amiel Esplana; Monil Gokani; Shachar Mirkin; Harsh Singh; Ashay; Srivastava; Endre Hamerlik; Fathinah Asma Izzati; Fadillah Adamsyah Maani,; Sebastian Cavada; Jenny Chim; Rohit Gupta; Sanjay Manjunath; Kamila; Zhumakhanova; Feno Heriniaina Rabevohitra; Azril Amirudin; Muhammad Ridzuan,; Daniya Kareem; Ketan More; Kunyang Li; Pramesh Shakya; Muhammad Saad,; Amirpouya Ghasemaghaei; Amirbek Djanibekov; Dilshod Azizov; Branislava; Jankovic; Naman Bhatia; Alvaro Cabrera; Johan Obando-Ceron; Olympiah Otieno,; Fabian Farestam; Muztoba Rabbani; Sanoojan Baliah; Santosh Sanjeev; Abduragim; Shtanchaev; Maheen Fatima; Thao Nguyen; Amrin Kareem; Toluwani Aremu; Nathan; Xavier; Amit Bhatkal; Hawau Toyin; Aman Chadha; Hisham Cholakkal; Rao; Muhammad Anwer; Michael Felsberg; Jorma Laaksonen; Thamar Solorio; Monojit; Choudhury; Ivan Laptev; Mubarak Shah; Salman Khan; Fahad Khan

arXiv:2411.16508·cs.CV·May 2, 2025

All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages

Ashmal Vayani, Dinura Dissanayake, Hasindri Watawana, Noor Ahsan,, Nevasini Sasikumar, Omkar Thawakar, Henok Biadglign Ademtew, Yahya Hmaiti,, Amandeep Kumar, Kartik Kuckreja, Mykola Maslych, Wafa Al Ghallabi, Mihail, Mihaylov, Chao Qin, Abdelrahman M Shaker, Mike Zhang

PDF

1 Repo 1 Datasets

TL;DR

This paper introduces ALM-bench, a comprehensive benchmark for evaluating Large Multimodal Models across 100 languages and diverse cultural contexts, emphasizing inclusivity and cultural understanding.

Contribution

It presents the largest and most diverse evaluation framework for LMMs, focusing on cultural and linguistic diversity, including low-resource languages and various cultural aspects.

Findings

01

LMMs show varying performance across languages and cultures.

02

The benchmark reveals gaps in models' understanding of low-resource languages.

03

Cultural diversity significantly impacts model reasoning abilities.

Abstract

Existing Large Multimodal Models (LMMs) generally focus on only a few regions and languages. As LMMs continue to improve, it is increasingly important to ensure they understand cultural contexts, respect local sensitivities, and support low-resource languages, all while effectively integrating corresponding visual cues. In pursuit of culturally diverse global multimodal models, our proposed All Languages Matter Benchmark (ALM-bench) represents the largest and most comprehensive effort to date for evaluating LMMs across 100 languages. ALM-bench challenges existing models by testing their ability to understand and reason about culturally diverse images paired with text in various languages, including many low-resource languages traditionally underrepresented in LMM research. The benchmark offers a robust and nuanced evaluation framework featuring various question formats, including…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mbzuai-oryx/ALM-Bench
pytorchOfficial

Datasets

MBZUAI/ALM-Bench
dataset· 270 dl
270 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFocus