VELVET-Med: Vision and Efficient Language Pre-training for Volumetric Imaging Tasks in Medicine

Ziyang Zhang; Yang Yu; Xulei Yang; Si Yong Yeo

arXiv:2508.12108·cs.CV·August 19, 2025

VELVET-Med: Vision and Efficient Language Pre-training for Volumetric Imaging Tasks in Medicine

Ziyang Zhang, Yang Yu, Xulei Yang, Si Yong Yeo

PDF

Open Access

TL;DR

VELVET-Med introduces a novel vision-language pre-training framework tailored for volumetric medical imaging, leveraging self-supervised learning and hierarchical contrastive objectives to improve performance on diverse downstream tasks with limited data.

Contribution

It presents a new VLP framework with a specialized language encoder and hierarchical contrastive learning, addressing data scarcity in volumetric medical imaging.

Findings

01

Achieves state-of-the-art results on multiple downstream tasks.

02

Effectively learns from only 38,875 scan-report pairs.

03

Enhances generalization and transferability of medical image encoders.

Abstract

Vision-and-language models (VLMs) have been increasingly explored in the medical domain, particularly following the success of CLIP in general domain. However, unlike the relatively straightforward pairing of 2D images and text, curating large-scale paired data in the medical field for volumetric modalities such as CT scans remains a challenging and time-intensive process. This difficulty often limits the performance on downstream tasks. To address these challenges, we propose a novel vision-language pre-training (VLP) framework, termed as \textbf{VELVET-Med}, specifically designed for limited volumetric data such as 3D CT and associated radiology reports. Instead of relying on large-scale data collection, our method focuses on the development of effective pre-training objectives and model architectures. The key contributions are: 1) We incorporate uni-modal self-supervised learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvances in Oncology and Radiotherapy · Radiomics and Machine Learning in Medical Imaging · Radiology practices and education