Read Like a Radiologist: Efficient Vision-Language Model for 3D Medical Imaging Interpretation
Changsun Lee, Sangjoon Park, Cheong-Il Shin, Woo Hee Choi, Hyun Jeong, Park, Jeong Eun Lee, Jong Chul Ye

TL;DR
This paper introduces MS-VLM, a novel vision-language model that mimics radiologists' sequential slice analysis to improve 3D medical image interpretation, overcoming previous limitations of volumetric representation methods.
Contribution
MS-VLM leverages self-supervised 2D transformers to learn inter-slice dependencies from sequences, enabling flexible volumetric representations from any slice length and multiple imaging planes.
Findings
MS-VLM outperforms existing methods in radiology report generation.
MS-VLM produces more coherent and clinically relevant reports.
The model demonstrates robustness across different 3D imaging modalities.
Abstract
Recent medical vision-language models (VLMs) have shown promise in 2D medical image interpretation. However extending them to 3D medical imaging has been challenging due to computational complexities and data scarcity. Although a few recent VLMs specified for 3D medical imaging have emerged, all are limited to learning volumetric representation of a 3D medical image as a set of sub-volumetric features. Such process introduces overly correlated representations along the z-axis that neglect slice-specific clinical details, particularly for 3D medical images where adjacent slices have low redundancy. To address this limitation, we introduce MS-VLM that mimic radiologists' workflow in 3D medical image interpretation. Specifically, radiologists analyze 3D medical images by examining individual slices sequentially and synthesizing information across slices and views. Likewise, MS-VLM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Imaging and Analysis · Multimodal Machine Learning Applications · Radiomics and Machine Learning in Medical Imaging
MethodsSparse Evolutionary Training
