UNETR: Transformers for 3D Medical Image Segmentation
Ali Hatamizadeh, Yucheng Tang, Vishwesh Nath, Dong Yang, Andriy, Myronenko, Bennett Landman, Holger Roth, Daguang Xu

TL;DR
UNETR introduces a transformer-based encoder for 3D medical image segmentation, capturing global context and long-range dependencies, leading to state-of-the-art results on multiple datasets.
Contribution
The paper presents UNETR, a novel architecture that integrates transformers into 3D segmentation, enabling better global feature learning compared to traditional CNNs.
Findings
Achieved state-of-the-art performance on BTCV dataset.
Outperformed existing methods on MSD brain tumor segmentation.
Demonstrated effective global context capture in 3D medical images.
Abstract
Fully Convolutional Neural Networks (FCNNs) with contracting and expanding paths have shown prominence for the majority of medical image segmentation applications since the past decade. In FCNNs, the encoder plays an integral role by learning both global and local features and contextual representations which can be utilized for semantic output prediction by the decoder. Despite their success, the locality of convolutional layers in FCNNs, limits the capability of learning long-range spatial dependencies. Inspired by the recent success of transformers for Natural Language Processing (NLP) in long-range sequence learning, we reformulate the task of volumetric (3D) medical image segmentation as a sequence-to-sequence prediction problem. We introduce a novel architecture, dubbed as UNEt TRansformers (UNETR), that utilizes a transformer as the encoder to learn sequence representations of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
UNETR: Transformers for 3D Medical Image Segmentation· youtube
Taxonomy
TopicsRadiomics and Machine Learning in Medical Imaging · COVID-19 diagnosis using AI · Advanced Neural Network Applications
MethodsAttention Is All You Need · *Communicated@Fast*How Do I Communicate to Expedia? · Linear Layer · Dense Connections · Max Pooling · 1x1 Convolution · Concatenated Skip Connection · U-Net · Residual Connection · Batch Normalization
