SigVLP: Sigmoid Volume-Language Pre-Training for Self-Supervised CT-Volume Adaptive Representation Learning

Jiayi Wang; Hadrien Reynaud; Ibrahim Ethem Hamamci; Sezgin Er; Suprosanna Shit; Bjoern Menze; Bernhard Kainz

arXiv:2602.21735·cs.CV·February 26, 2026

SigVLP: Sigmoid Volume-Language Pre-Training for Self-Supervised CT-Volume Adaptive Representation Learning

Jiayi Wang, Hadrien Reynaud, Ibrahim Ethem Hamamci, Sezgin Er, Suprosanna Shit, Bjoern Menze, Bernhard Kainz

PDF

Open Access

TL;DR

SigVLP introduces a novel volume-language pre-training method for medical imaging that uses rotary position embeddings to handle variable-sized CT volumes, enhancing alignment and performance on downstream tasks.

Contribution

The paper proposes a new approach using Rotary Position Embeddings for flexible, size-agnostic volumetric representation learning in medical imaging, enabling better text-volume alignment.

Findings

01

Improved zero-shot classification accuracy

02

Enhanced segmentation and retrieval performance

03

Robustness to variable input sizes

Abstract

Large-scale, volumetric medical imaging datasets typically aggregate scans from different vendors and devices, resulting in highly variable resolution, slice thicknesses, and numbers of slices per study. Consequently, training representation models usually requires cropping or interpolating along the z-axis to obtain fixed-size blocks, which inevitably causes information loss. We propose a new training approach to overcome this limitation. Instead of absolute position embeddings, we interpret volumes as sequences of 3D chunks and adopt Rotary Position Embeddings, allowing us to treat the z-axis as an unconstrained temporal dimensions. Building on this idea, we introduce a new vision-language model: SigVLP. In SigVLP, we implement Rotary Position Embedding as the positional encoding method, which is applied directly within the attention operation, generating input-conditioned sine and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCOVID-19 diagnosis using AI · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning