MEENA (PersianMMMU): Multimodal-Multilingual Educational Exams for N-level Assessment

Omid Ghahroodi; Arshia Hemmat; Marzia Nouri; Seyed Mohammad Hadi Hosseini; Doratossadat Dastgheib; Mohammad Vali Sanian; Alireza Sahebi; Reihaneh Zohrabi; Mohammad Hossein Rohban; Ehsaneddin Asgari; Mahdieh Soleymani Baghshah

arXiv:2508.17290·cs.AI·August 26, 2025

MEENA (PersianMMMU): Multimodal-Multilingual Educational Exams for N-level Assessment

Omid Ghahroodi, Arshia Hemmat, Marzia Nouri, Seyed Mohammad Hadi Hosseini, Doratossadat Dastgheib, Mohammad Vali Sanian, Alireza Sahebi, Reihaneh Zohrabi, Mohammad Hossein Rohban, Ehsaneddin Asgari, Mahdieh Soleymani Baghshah

PDF

1 Video

TL;DR

MEENA (PersianMMMU) is a comprehensive dataset designed to evaluate Persian vision-language models across diverse educational and cultural tasks, addressing the gap in multilingual model assessment.

Contribution

Introduces MEENA, the first Persian VLM benchmark with diverse questions, metadata, and bilingual structure to evaluate cross-linguistic and multimodal understanding.

Findings

01

Assesses model performance across multiple tasks and subjects.

02

Highlights challenges in Persian multimodal understanding.

03

Provides insights into model hallucinations and attention capabilities.

Abstract

Recent advancements in large vision-language models (VLMs) have primarily focused on English, with limited attention given to other languages. To address this gap, we introduce MEENA (also known as PersianMMMU), the first dataset designed to evaluate Persian VLMs across scientific, reasoning, and human-level understanding tasks. Our dataset comprises approximately 7,500 Persian and 3,000 English questions, covering a wide range of topics such as reasoning, mathematics, physics, diagrams, charts, and Persian art and literature. Key features of MEENA include: (1) diverse subject coverage spanning various educational levels, from primary to upper secondary school, (2) rich metadata, including difficulty levels and descriptive answers, (3) original Persian data that preserves cultural nuances, (4) a bilingual structure to assess cross-linguistic performance, and (5) a series of diverse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

MEENA (PersianMMMU): Multimodal-Multilingual Educational Exams for N-level Assessment· underline