TL;DR
This paper introduces a convolution-free transformer-based model for medical image segmentation that relies solely on self-attention mechanisms, achieving competitive or superior results compared to CNNs, especially with limited labeled data.
Contribution
The work demonstrates that self-attention alone can replace convolutions in medical image segmentation, offering a new approach without inductive biases of CNNs.
Findings
Achieves better segmentation accuracy than CNNs on three datasets.
Pre-training enhances performance significantly with small labeled datasets.
Eliminates the need for convolution operations in segmentation models.
Abstract
Like other applications in computer vision, medical image segmentation has been most successfully addressed using deep learning models that rely on the convolution operation as their main building block. Convolutions enjoy important properties such as sparse interactions, weight sharing, and translation equivariance. These properties give convolutional neural networks (CNNs) a strong and useful inductive bias for vision tasks. In this work we show that a different method, based entirely on self-attention between neighboring image patches and without any convolution operations, can achieve competitive or better results. Given a 3D image block, our network divides it into 3D patches, where and computes a 1D embedding for each patch. The network predicts the segmentation map for the center patch of the block based on the self-attention between these patch…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsConvolution
