M4U: Evaluating Multilingual Understanding and Reasoning for Large Multimodal Models
Hongyu Wang, Jiayu Xu, Senwei Xie, Ruiping Wang, Jialin Li, and Zhaojie Xie, Bin Zhang, Chuyan Xiong, Xilin Chen

TL;DR
This paper introduces M4U, a comprehensive benchmark for evaluating multilingual multimodal understanding and reasoning in large models, revealing current models' limitations in cross-lingual multimodal reasoning across diverse scientific disciplines.
Contribution
The paper presents M4U, a new challenging benchmark with 10,000 samples across 64 disciplines in six languages, and provides extensive evaluations of state-of-the-art multilingual multimodal models.
Findings
GPT-4o achieves only 47.6% accuracy on M4U.
Leading models show significant language preferences.
Models struggle with cross-lingual multimodal reasoning.
Abstract
Multilingual capability is an essential aspect for large multimodal models, since they are usually deployed across various countries and languages. However, most existing benchmarks for multilingual multimodal reasoning struggle to differentiate between models of varying performance; even language models without visual capabilities can easily achieve high scores. This leaves a comprehensive evaluation of leading multilingual multimodal models largely unexplored. In this work, we introduce M4U, a novel and challenging benchmark for assessing the capability of multi-discipline multilingual multimodal understanding and reasoning. M4U contains 10k samples covering 64 disciplines across 16 subfields in Science, Engineering, and Healthcare in six languages. Using M4U, we conduct extensive evaluations of leading Large Multimodal Models (LMMs) and Large Language Models (LLMs) with external…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems · Topic Modeling
