Scaling Self-Supervised and Cross-Modal Pretraining for Volumetric CT Transformers

Cris Claessens; Christiaan Viviers; Giacomo D'Amicantonio; Egor Bondarev; Fons van der Sommen

arXiv:2511.17209·cs.CV·March 31, 2026

Scaling Self-Supervised and Cross-Modal Pretraining for Volumetric CT Transformers

Cris Claessens, Christiaan Viviers, Giacomo D'Amicantonio, Egor Bondarev, Fons van der Sommen

PDF

1 Models

TL;DR

SPECTRE is a scalable, fully transformer-based foundation model for volumetric CT that leverages self-supervised and vision-language pretraining to learn general-purpose, clinically meaningful representations from openly available datasets.

Contribution

The paper introduces SPECTRE, a novel 3D transformer architecture with joint local and global modeling, trained exclusively on open data, achieving state-of-the-art results in CT representation learning.

Findings

01

SPECTRE outperforms previous models on multiple CT benchmarks.

02

Pretraining with self-distillation and vision-language alignment improves clinical relevance.

03

The model is effective in both zero-shot and fine-tuned scenarios.

Abstract

We introduce SPECTRE, a fully transformer-based foundation model for volumetric computed tomography (CT). Our Self-Supervised & Cross-Modal Pretraining for CT Representation Extraction (SPECTRE) approach utilizes scalable 3D Vision Transformer architectures and modern self-supervised and vision-language pretraining strategies to learn general-purpose CT representations. Volumetric CT poses unique challenges, such as extreme token scaling, geometric anisotropy, and weak or noisy clinical supervision, that make standard transformer and contrastive learning recipes ineffective out of the box. The framework jointly optimizes a local transformer for high-resolution volumetric feature extraction and a global transformer for whole-scan context modeling, making large-scale 3D attention computationally tractable. Notably, SPECTRE is trained exclusively on openly available CT datasets,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
cclaess/SPECTRE-Large
model· 94 dl
94 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.