MedLayBench-V: A Large-Scale Benchmark for Expert-Lay Semantic Alignment in Medical Vision Language Models

Han Jang; Junhyeok Lee; Heeseong Eum; Kyu Sung Choi

arXiv:2604.05738·cs.CL·April 8, 2026

MedLayBench-V: A Large-Scale Benchmark for Expert-Lay Semantic Alignment in Medical Vision Language Models

Han Jang, Junhyeok Lee, Heeseong Eum, Kyu Sung Choi

PDF

1 Datasets

TL;DR

MedLayBench-V is a large-scale multimodal benchmark designed to improve medical vision-language models' ability to communicate diagnostic findings in lay language, addressing a critical resource gap.

Contribution

We introduce MedLayBench-V, the first large-scale benchmark for expert-lay semantic alignment in medical vision-language models, constructed via a novel SCGR pipeline.

Findings

01

Dataset enforces strict semantic equivalence using UMLS CUIs.

02

Provides a verified foundation for training and evaluating lay-accessible medical models.

03

Addresses the lack of large-scale benchmarks for expert-lay medical image understanding.

Abstract

Medical Vision-Language Models (Med-VLMs) have achieved expert-level proficiency in interpreting diagnostic imaging. However, current models are predominantly trained on professional literature, limiting their ability to communicate findings in the lay register required for patient-centered care. While text-centric research has actively developed resources for simplifying medical jargon, there is a critical absence of large-scale multimodal benchmarks designed to facilitate lay-accessible medical image understanding. To bridge this resource gap, we introduce MedLayBench-V, the first large-scale multimodal benchmark dedicated to expert-lay semantic alignment. Unlike naive simplification approaches that risk hallucination, our dataset is constructed via a Structured Concept-Grounded Refinement (SCGR) pipeline. This method enforces strict semantic equivalence by integrating Unified Medical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

hanjang/MedLayBench-V
dataset· 208 dl
208 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.