UNetFormer: A Unified Vision Transformer Model and Pre-Training   Framework for 3D Medical Image Segmentation

Ali Hatamizadeh; Ziyue Xu; Dong Yang; Wenqi Li; Holger Roth and; Daguang Xu

arXiv:2204.00631·eess.IV·April 6, 2022·29 cites

UNetFormer: A Unified Vision Transformer Model and Pre-Training Framework for 3D Medical Image Segmentation

Ali Hatamizadeh, Ziyue Xu, Dong Yang, Wenqi Li, Holger Roth and, Daguang Xu

PDF

Open Access 1 Repo

TL;DR

This paper introduces UNetFormer, a unified 3D vision transformer framework with a novel pre-training method, achieving state-of-the-art results in medical image segmentation tasks across CT and MRI datasets.

Contribution

The work presents a new unified architecture combining Swin Transformer and CNN for 3D medical segmentation, along with a self-supervised pre-training strategy for improved performance.

Findings

01

Achieved state-of-the-art liver and liver tumor segmentation results.

02

Outperformed existing methods on BraTS 21 brain tumor dataset.

03

Demonstrated effective self-supervised pre-training for 3D medical images.

Abstract

Vision Transformers (ViT)s have recently become popular due to their outstanding modeling capabilities, in particular for capturing long-range information, and scalability to dataset and model sizes which has led to state-of-the-art performance in various computer vision and medical image analysis tasks. In this work, we introduce a unified framework consisting of two architectures, dubbed UNetFormer, with a 3D Swin Transformer-based encoder and Convolutional Neural Network (CNN) and transformer-based decoders. In the proposed model, the encoder is linked to the decoder via skip connections at five different resolutions with deep supervision. The design of proposed architecture allows for meeting a wide range of trade-off requirements between accuracy and computational cost. In addition, we present a methodology for self-supervised pre-training of the encoder backbone via learning to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

project-monai/research-contributions
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRadiomics and Machine Learning in Medical Imaging · AI in cancer detection · Advanced Neural Network Applications