Beyond the Hype: A dispassionate look at vision-language models in medical scenario
Yang Nan, Huichi Zhou, Xiaodan Xing, Guang Yang

TL;DR
This paper introduces RadVUQA, a comprehensive benchmark for evaluating vision-language models in medical scenarios, revealing significant gaps in their understanding and reasoning abilities compared to clinicians.
Contribution
The study presents RadVUQA, a novel benchmark that assesses LVLMs across five critical medical understanding dimensions, highlighting their deficiencies and guiding future improvements.
Findings
LVLMs show weak multimodal comprehension.
LVLMs lack robust quantitative reasoning.
Existing LVLMs have significant gaps compared to clinicians.
Abstract
Recent advancements in Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities across diverse tasks, garnering significant attention in AI communities. However, their performance and reliability in specialized domains such as medicine remain insufficiently assessed. In particular, most assessments over-concentrate on evaluating VLMs based on simple Visual Question Answering (VQA) on multi-modality data, while ignoring the in-depth characteristics of LVLMs. In this study, we introduce RadVUQA, a novel Radiological Visual Understanding and Question Answering benchmark, to comprehensively evaluate existing LVLMs. RadVUQA mainly validates LVLMs across five dimensions: 1) Anatomical understanding, assessing the models' ability to visually identify biological structures; 2) Multimodal comprehension, which involves the capability of interpreting linguistic and visual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical and Biological Sciences · Conferences and Exhibitions Management · Empathy and Medical Education
MethodsSoftmax · Attention Is All You Need
