CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation
Yutong Xie, Jianpeng Zhang, Chunhua Shen, Yong Xia

TL;DR
This paper introduces CoTr, a hybrid CNN-Transformer framework with deformable self-attention for efficient and accurate 3D medical image segmentation, addressing the limitations of existing models in modeling long-range dependencies.
Contribution
The paper proposes a novel deformable Transformer integrated with CNNs, reducing complexity and enabling high-resolution 3D segmentation, which outperforms existing methods.
Findings
Significant performance improvement over CNN, Transformer, and hybrid methods.
Efficient deformable self-attention reduces computational complexity.
Effective multi-scale high-resolution processing for 3D segmentation.
Abstract
Convolutional neural networks (CNNs) have been the de facto standard for nowadays 3D medical image segmentation. The convolutional operations used in these networks, however, inevitably have limitations in modeling the long-range dependency due to their inductive bias of locality and weight sharing. Although Transformer was born to address this issue, it suffers from extreme computational and spatial complexities in processing high-resolution 3D feature maps. In this paper, we propose a novel framework that efficiently bridges a {\bf Co}nvolutional neural network and a {\bf Tr}ansformer {\bf (CoTr)} for accurate 3D medical image segmentation. Under this framework, the CNN is constructed to extract feature representations and an efficient deformable Transformer (DeTrans) is built to model the long-range dependency on the extracted feature maps. Different from the vanilla Transformer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Medical Image Segmentation Techniques · Medical Imaging and Analysis
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Layer Normalization · Residual Connection · Dropout · Adam · Label Smoothing · Multi-Head Attention
