Benchmarking transferability of SSL pretraining to same and different modality segmentation tasks

Jue Jiang; Harini Veeraraghavan

arXiv:2605.18491·cs.CV·May 19, 2026

Benchmarking transferability of SSL pretraining to same and different modality segmentation tasks

Jue Jiang, Harini Veeraraghavan

PDF

TL;DR

This study benchmarks various self-supervised learning methods for 3D medical image segmentation, highlighting SMIT's superior accuracy, efficiency, and transferability across modalities and data sizes.

Contribution

It provides a comprehensive comparison of nine SSL methods on multiple medical segmentation tasks, emphasizing the effectiveness of MIM-based approaches like SMIT.

Findings

01

SMIT achieved highest accuracy and fastest convergence.

02

MIM-based methods outperformed contrastive and rotation prediction.

03

SSL method choice impacts performance most in few-shot scenarios.

Abstract

Methods: Nine SSL methods spanning four pretext-task families were pretrained from scratch using the same 10{,}412 3D CT scans (1.89~M 2D axial slices) covering varied disease sites. The pretrained Swin Transformer encoder from each method was integrated into a SwinUNETR-style segmentation network (Swin encoder with a 3D CNN decoder and skip connections) and fine-tuned on nine public segmentation tasks of varying complexity, including large abdominal organs, head-and-neck structures, and tumors from CT and MRI. Performance was assessed using Dice similarity coefficient (DSC). Fine-tuning convergence speed, transferability across modalities (CT-to-MRI), and feature-reuse patterns between few- and many-shot fine tuning were further analyzed using centered kernel alignment. Results: Self-distilled masked image transformer (SMIT), which combines masked image modeling (MIM) with local and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.