KMMMU: Evaluation of Massive Multi-discipline Multimodal Understanding in Korean Language and Context

Nahyun Lee; Guijin Son; Hyunwoo Ko; Chanyoung Kim; JunYoung An; Kyubeen Han; Il-Youp Kwak

arXiv:2604.13058·cs.CL·April 20, 2026

KMMMU: Evaluation of Massive Multi-discipline Multimodal Understanding in Korean Language and Context

Nahyun Lee, Guijin Son, Hyunwoo Ko, Chanyoung Kim, JunYoung An, Kyubeen Han, Il-Youp Kwak

PDF

1 Datasets

TL;DR

KMMMU is a Korean multimodal benchmark with 3,466 culturally and discipline-specific questions designed to evaluate AI understanding in Korean contexts, revealing significant performance gaps and challenges.

Contribution

This paper introduces KMMMU, the first Korean-specific multimodal benchmark focusing on local conventions and discipline-specific visual formats for AI evaluation.

Findings

01

Open-source models achieve only 42.05% accuracy on the full set.

02

Proprietary models reach 52.42% accuracy on the hard subset.

03

Performance varies across disciplines, highlighting domain-specific challenges.

Abstract

We introduce KMMMU, a native Korean benchmark for evaluating multimodal understanding in Korean cultural and institutional settings. KMMMU contains 3,466 questions from exams natively written in Korean, covering nine disciplines and nine visual modality categories, along with a 300-item Korean-specific subset and a hard subset of 627 questions. Unlike translated or English-centric benchmarks, KMMMU targets information-dense problems shaped by local conventions, official standards, and discipline-specific visual formats. Experiments show that the strongest open-source model reaches only 42.05% accuracy on the full set, while the best proprietary model achieves 52.42% on the hard subset. Performance varies across disciplines, with some disciplines emerging as bottlenecks, and Korean-specific questions showing gaps of up to 13.43%. Error analysis suggests that these failures stem less from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

HAERAE-HUB/KMMMU
dataset· 380 dl
380 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.