VELVET-Med: Vision and Efficient Language Pre-training for Volumetric Imaging Tasks in Medicine
Ziyang Zhang, Yang Yu, Xulei Yang, Si Yong Yeo

TL;DR
VELVET-Med introduces a novel vision-language pre-training framework tailored for volumetric medical imaging, leveraging self-supervised learning and hierarchical contrastive objectives to improve performance on diverse downstream tasks with limited data.
Contribution
It presents a new VLP framework with a specialized language encoder and hierarchical contrastive learning, addressing data scarcity in volumetric medical imaging.
Findings
Achieves state-of-the-art results on multiple downstream tasks.
Effectively learns from only 38,875 scan-report pairs.
Enhances generalization and transferability of medical image encoders.
Abstract
Vision-and-language models (VLMs) have been increasingly explored in the medical domain, particularly following the success of CLIP in general domain. However, unlike the relatively straightforward pairing of 2D images and text, curating large-scale paired data in the medical field for volumetric modalities such as CT scans remains a challenging and time-intensive process. This difficulty often limits the performance on downstream tasks. To address these challenges, we propose a novel vision-language pre-training (VLP) framework, termed as \textbf{VELVET-Med}, specifically designed for limited volumetric data such as 3D CT and associated radiology reports. Instead of relying on large-scale data collection, our method focuses on the development of effective pre-training objectives and model architectures. The key contributions are: 1) We incorporate uni-modal self-supervised learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvances in Oncology and Radiotherapy · Radiomics and Machine Learning in Medical Imaging · Radiology practices and education
