EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language Models
Rocktim Jyoti Das, Simeon Emilov Hristov, Haonan Li, Dimitar, Iliyanov Dimitrov, Ivan Koychev, Preslav Nakov

TL;DR
EXAMS-V is a comprehensive, multilingual, multimodal exam benchmark with over 20,000 questions across various disciplines, designed to evaluate and challenge vision-language models' reasoning and perception capabilities.
Contribution
It introduces a novel, diverse dataset of multilingual, multimodal exam questions from multiple countries, emphasizing complex reasoning and cross-modal understanding.
Findings
Current models like GPT-4V and Gemini struggle with the dataset.
The dataset covers multiple disciplines and languages, increasing evaluation complexity.
EXAMS-V sets a new standard for benchmarking vision-language reasoning.
Abstract
We introduce EXAMS-V, a new challenging multi-discipline multimodal multilingual exam benchmark for evaluating vision language models. It consists of 20,932 multiple-choice questions across 20 school disciplines covering natural science, social science, and other miscellaneous studies, e.g., religion, fine arts, business, etc. EXAMS-V includes a variety of multimodal features such as text, images, tables, figures, diagrams, maps, scientific symbols, and equations. The questions come in 11 languages from 7 language families. Unlike existing benchmarks, EXAMS-V is uniquely curated by gathering school exam questions from various countries, with a variety of education systems. This distinctive approach calls for intricate reasoning across diverse languages and relies on region-specific knowledge. Solving the problems in the dataset requires advanced perception and joint reasoning over the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage, Metaphor, and Cognition
