WorldMedQA-V: a multilingual, multimodal medical examination dataset for multimodal language models evaluation
Jo\~ao Matos, Shan Chen, Siena Placino, Yingya Li, Juan Carlos Climent, Pardo, Daphna Idan, Takeshi Tohyama, David Restrepo, Luis F. Nakayama, Jose, M. M. Pascual-Leone, Guergana Savova, Hugo Aerts, Leo A. Celi, A. Ian Wong,, Danielle S. Bitterman, Jack Gallifant

TL;DR
WorldMedQA-V is a comprehensive multilingual, multimodal dataset designed to evaluate vision-language models in healthcare, including diverse questions, images, and translations from multiple countries to improve AI fairness and effectiveness.
Contribution
It introduces a novel multilingual, multimodal medical QA dataset with images and translations from four countries, filling gaps in existing text-only benchmarks.
Findings
Baseline models show varied performance across languages and modalities.
The dataset enables evaluation of models in diverse healthcare settings.
It promotes development of more equitable and effective AI in medicine.
Abstract
Multimodal/vision language models (VLMs) are increasingly being deployed in healthcare settings worldwide, necessitating robust benchmarks to ensure their safety, efficacy, and fairness. Multiple-choice question and answer (QA) datasets derived from national medical examinations have long served as valuable evaluation tools, but existing datasets are largely text-only and available in a limited subset of languages and countries. To address these challenges, we present WorldMedQA-V, an updated multilingual, multimodal benchmarking dataset designed to evaluate VLMs in healthcare. WorldMedQA-V includes 568 labeled multiple-choice QAs paired with 568 medical images from four countries (Brazil, Israel, Japan, and Spain), covering original languages and validated English translations by native clinicians, respectively. Baseline performance for common open- and closed-source models are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education
