MedBookVQA: A Systematic and Comprehensive Medical Benchmark Derived from Open-Access Book

Sau Lai Yip; Sunan He; Yuxiang Nie; Shu Pui Chan; Yilin Ye; Sum Ying Lam; Hao Chen

arXiv:2506.00855·cs.AI·June 3, 2025

MedBookVQA: A Systematic and Comprehensive Medical Benchmark Derived from Open-Access Book

Sau Lai Yip, Sunan He, Yuxiang Nie, Shu Pui Chan, Yilin Ye, Sum Ying Lam, Hao Chen

PDF

Open Access 1 Repo 1 Datasets

TL;DR

MedBookVQA introduces a comprehensive multimodal benchmark from open-access medical textbooks, enabling systematic evaluation of medical AI models across diverse clinical tasks and specialties.

Contribution

This work presents a novel pipeline for extracting medical figures and narratives to create a large-scale, hierarchical benchmark for multimodal medical AI evaluation.

Findings

01

Significant performance gaps in current medical AI models across tasks.

02

The benchmark reveals disparities among different model categories.

03

MedBookVQA provides detailed performance metrics across medical subdomains.

Abstract

The accelerating development of general medical artificial intelligence (GMAI), powered by multimodal large language models (MLLMs), offers transformative potential for addressing persistent healthcare challenges, including workforce deficits and escalating costs. The parallel development of systematic evaluation benchmarks emerges as a critical imperative to enable performance assessment and provide technological guidance. Meanwhile, as an invaluable knowledge source, the potential of medical textbooks for benchmark development remains underexploited. Here, we present MedBookVQA, a systematic and comprehensive multimodal benchmark derived from open-access medical textbooks. To curate this benchmark, we propose a standardized pipeline for automated extraction of medical figures while contextually aligning them with corresponding medical narratives. Based on this curated data, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

slyipae1/MedBookVQA
pytorchOfficial

Datasets

slyipae1/MedBookVQA
dataset· 17 dl
17 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHealth Sciences Research and Education