nnFormer: Interleaved Transformer for Volumetric Segmentation
Hong-Yu Zhou, Jiansen Guo, Yinghao Zhang, Lequan Yu, Liansheng Wang,, Yizhou Yu

TL;DR
nnFormer introduces a novel 3D transformer architecture for volumetric medical image segmentation, combining interleaved convolution and self-attention, leading to significant performance improvements over previous methods.
Contribution
The paper presents nnFormer, a new 3D transformer model that integrates local and global volume-based self-attention with skip attention in a U-Net architecture for improved segmentation.
Findings
Outperforms previous transformer-based methods on three datasets
Achieves lower HD95 and comparable DSC to nnUNet
Highly complementary to nnUNet in model ensembling
Abstract
Transformer, the model of choice for natural language processing, has drawn scant attention from the medical imaging community. Given the ability to exploit long-term dependencies, transformers are promising to help atypical convolutional neural networks to overcome their inherent shortcomings of spatial inductive bias. However, most of recently proposed transformer-based segmentation approaches simply treated transformers as assisted modules to help encode global context into convolutional representations. To address this issue, we introduce nnFormer, a 3D transformer for volumetric medical image segmentation. nnFormer not only exploits the combination of interleaved convolution and self-attention operations, but also introduces local and global volume-based self-attention mechanism to learn volume representations. Moreover, nnFormer proposes to use skip attention to replace the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiomics and Machine Learning in Medical Imaging · Advanced Neural Network Applications · Medical Imaging and Analysis
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Linear Layer · Max Pooling · Concatenated Skip Connection · U-Net · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Label Smoothing · Dropout · Byte Pair Encoding
