Beyond the Hype: A dispassionate look at vision-language models in medical scenario

Yang Nan; Huichi Zhou; Xiaodan Xing; Guang Yang

arXiv:2408.08704·cs.CV·June 30, 2025

Beyond the Hype: A dispassionate look at vision-language models in medical scenario

Yang Nan, Huichi Zhou, Xiaodan Xing, Guang Yang

PDF

Open Access

TL;DR

This paper introduces RadVUQA, a comprehensive benchmark for evaluating vision-language models in medical scenarios, revealing significant gaps in their understanding and reasoning abilities compared to clinicians.

Contribution

The study presents RadVUQA, a novel benchmark that assesses LVLMs across five critical medical understanding dimensions, highlighting their deficiencies and guiding future improvements.

Findings

01

LVLMs show weak multimodal comprehension.

02

LVLMs lack robust quantitative reasoning.

03

Existing LVLMs have significant gaps compared to clinicians.

Abstract

Recent advancements in Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities across diverse tasks, garnering significant attention in AI communities. However, their performance and reliability in specialized domains such as medicine remain insufficiently assessed. In particular, most assessments over-concentrate on evaluating VLMs based on simple Visual Question Answering (VQA) on multi-modality data, while ignoring the in-depth characteristics of LVLMs. In this study, we introduce RadVUQA, a novel Radiological Visual Understanding and Question Answering benchmark, to comprehensively evaluate existing LVLMs. RadVUQA mainly validates LVLMs across five dimensions: 1) Anatomical understanding, assessing the models' ability to visually identify biological structures; 2) Multimodal comprehension, which involves the capability of interpreting linguistic and visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMedical and Biological Sciences · Conferences and Exhibitions Management · Empathy and Medical Education

MethodsSoftmax · Attention Is All You Need