Read Like a Radiologist: Efficient Vision-Language Model for 3D Medical   Imaging Interpretation

Changsun Lee; Sangjoon Park; Cheong-Il Shin; Woo Hee Choi; Hyun Jeong; Park; Jeong Eun Lee; Jong Chul Ye

arXiv:2412.13558·eess.IV·December 19, 2024·2 cites

Read Like a Radiologist: Efficient Vision-Language Model for 3D Medical Imaging Interpretation

Changsun Lee, Sangjoon Park, Cheong-Il Shin, Woo Hee Choi, Hyun Jeong, Park, Jeong Eun Lee, Jong Chul Ye

PDF

Open Access

TL;DR

This paper introduces MS-VLM, a novel vision-language model that mimics radiologists' sequential slice analysis to improve 3D medical image interpretation, overcoming previous limitations of volumetric representation methods.

Contribution

MS-VLM leverages self-supervised 2D transformers to learn inter-slice dependencies from sequences, enabling flexible volumetric representations from any slice length and multiple imaging planes.

Findings

01

MS-VLM outperforms existing methods in radiology report generation.

02

MS-VLM produces more coherent and clinically relevant reports.

03

The model demonstrates robustness across different 3D imaging modalities.

Abstract

Recent medical vision-language models (VLMs) have shown promise in 2D medical image interpretation. However extending them to 3D medical imaging has been challenging due to computational complexities and data scarcity. Although a few recent VLMs specified for 3D medical imaging have emerged, all are limited to learning volumetric representation of a 3D medical image as a set of sub-volumetric features. Such process introduces overly correlated representations along the z-axis that neglect slice-specific clinical details, particularly for 3D medical images where adjacent slices have low redundancy. To address this limitation, we introduce MS-VLM that mimic radiologists' workflow in 3D medical image interpretation. Specifically, radiologists analyze 3D medical images by examining individual slices sequentially and synthesizing information across slices and views. Likewise, MS-VLM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMedical Imaging and Analysis · Multimodal Machine Learning Applications · Radiomics and Machine Learning in Medical Imaging

MethodsSparse Evolutionary Training