Med3DInsight: Enhancing 3D Medical Image Understanding with 2D Multi-Modal Large Language Models
Qiuhui Chen, Huping Ye, Yi Hong

TL;DR
Med3DInsight introduces a novel framework that combines 3D medical image encoders with 2D multi-modal large language models using a specialized transformer, significantly improving understanding and performance on medical image tasks.
Contribution
The paper presents Med3DInsight, a new pre-training framework that bridges 3D medical image encoders with 2D MLLMs via a Plane-Slice-Aware Transformer, enhancing 3D medical image understanding.
Findings
Achieved state-of-the-art results on segmentation and classification tasks.
Demonstrated significant performance improvements over baseline methods.
Validated effectiveness across multiple public CT and MRI datasets.
Abstract
Understanding 3D medical image volumes is a critical task in the medical domain. However, existing 3D convolution and transformer-based methods have limited semantic understanding of an image volume and also need a large set of volumes for training. Recent advances in multi-modal large language models (MLLMs) provide a new and promising way to understand images with the help of text descriptions. However, most current MLLMs are designed for 2D natural images. To enhance the 3D medical image understanding with 2D MLLMs, we propose a novel pre-training framework called Med3DInsight, which marries existing 3D image encoders with 2D MLLMs and bridges them via a designed Plane-Slice-Aware Transformer (PSAT) module. Extensive experiments demonstrate our SOTA performance on two downstream segmentation and classification tasks, including three public datasets with CT and MRI modalities and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiomics and Machine Learning in Medical Imaging · Topic Modeling · AI in cancer detection
MethodsAttention Is All You Need · Sparse Evolutionary Training · Linear Layer · Dropout · Multi-Head Attention · Layer Normalization · Absolute Position Encodings · Softmax · Dense Connections · Label Smoothing
