TAP-CT: 3D Task-Agnostic Pretraining of Computed Tomography Foundation Models

Tim Veenboer; George Yiasemis; Eric Marcus; Vivien Van Veldhuizen; Cees G. M. Snoek; Jonas Teuwen; Kevin B. W. Groot Lipman

arXiv:2512.00872·cs.CV·December 2, 2025

TAP-CT: 3D Task-Agnostic Pretraining of Computed Tomography Foundation Models

Tim Veenboer, George Yiasemis, Eric Marcus, Vivien Van Veldhuizen, Cees G. M. Snoek, Jonas Teuwen, Kevin B. W. Groot Lipman

PDF

Open Access 6 Models

TL;DR

This paper introduces TAP-CT, a task-agnostic pretraining framework for 3D CT models using Vision Transformers and DINOv2, enabling scalable self-supervised learning on large volumetric datasets with minimal fine-tuning.

Contribution

It presents a novel adaptation of ViTs and DINOv2 for volumetric CT data, achieving robust, generalizable representations through large-scale self-supervised pretraining.

Findings

01

Pretrained models generalize well across multiple downstream tasks.

02

Large-scale pretraining on 105K CT volumes improves robustness.

03

The approach requires minimal fine-tuning for effective feature extraction.

Abstract

Existing foundation models (FMs) in the medical domain often require extensive fine-tuning or rely on training resource-intensive decoders, while many existing encoders are pretrained with objectives biased toward specific tasks. This illustrates a need for a strong, task-agnostic foundation model that requires minimal fine-tuning beyond feature extraction. In this work, we introduce a suite of task-agnostic pretraining of CT foundation models (TAP-CT): a simple yet effective adaptation of Vision Transformers (ViTs) and DINOv2 for volumetric data, enabling scalable self-supervised pretraining directly on 3D CT volumes. Our approach incorporates targeted modifications to patch embeddings, positional encodings, and volumetric augmentations, making the architecture depth-aware while preserving the simplicity of the underlying architectures. We show that large-scale 3D pretraining on an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis · Medical Imaging and Analysis