KorMedMCQA-V: A Multimodal Benchmark for Evaluating Vision-Language Models on the Korean Medical Licensing Examination

Byungjin Choi; Seongsu Bae; Sunjun Kweon; Edward Choi

arXiv:2602.13650·cs.CV·February 17, 2026

KorMedMCQA-V: A Multimodal Benchmark for Evaluating Vision-Language Models on the Korean Medical Licensing Examination

Byungjin Choi, Seongsu Bae, Sunjun Kweon, Edward Choi

PDF

Open Access 1 Datasets

TL;DR

KorMedMCQA-V is a comprehensive multimodal benchmark for evaluating vision-language models on Korean medical exam questions, highlighting the challenges and performance gaps across different models and modalities.

Contribution

Introduces KorMedMCQA-V, a novel multimodal dataset for Korean medical exams, and benchmarks over 50 vision-language models, revealing insights into model performance and modality-specific challenges.

Findings

01

Proprietary models achieve up to 96.9% accuracy.

02

Model reasoning variants outperform instruction-tuned versions.

03

Multi-image questions significantly degrade model performance.

Abstract

We introduce KorMedMCQA-V, a Korean medical licensing-exam-style multimodal multiple-choice question answering benchmark for evaluating vision-language models (VLMs). The dataset consists of 1,534 questions with 2,043 associated images from Korean Medical Licensing Examinations (2012-2023), with about 30% containing multiple images requiring cross-image evidence integration. Images cover clinical modalities including X-ray, computed tomography (CT), electrocardiography (ECG), ultrasound, endoscopy, and other medical visuals. We benchmark over 50 VLMs across proprietary and open-source categories-spanning general-purpose, medical-specialized, and Korean-specialized families-under a unified zero-shot evaluation protocol. The best proprietary model (Gemini-3.0-Pro) achieves 96.9% accuracy, the best open-source model (Qwen3-VL-32B-Thinking) 83.7%, and the best Korean-specialized model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

seongsubae/KorMedMCQA-V
dataset· 93 dl
93 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling