UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation
Yunhe Gao, Mu Zhou, Dimitris Metaxas

TL;DR
UTNet introduces a hybrid Transformer architecture that combines self-attention with convolutional neural networks to improve medical image segmentation, achieving superior performance without pre-training on a cardiac MRI dataset.
Contribution
The paper presents a novel hybrid Transformer architecture with an efficient self-attention mechanism and a new decoder, enabling effective medical image segmentation without pre-training.
Findings
UTNet outperforms state-of-the-art methods in cardiac MRI segmentation.
The model demonstrates robustness across multi-vendor datasets.
Self-attention modules improve long-range dependency capture at multiple scales.
Abstract
Transformer architecture has emerged to be successful in a number of natural language processing tasks. However, its applications to medical vision remain largely unexplored. In this study, we present UTNet, a simple yet powerful hybrid Transformer architecture that integrates self-attention into a convolutional neural network for enhancing medical image segmentation. UTNet applies self-attention modules in both encoder and decoder for capturing long-range dependency at different scales with minimal overhead. To this end, we propose an efficient self-attention mechanism along with relative position encoding that reduces the complexity of self-attention operation significantly from to approximate . A new self-attention decoder is also proposed to recover fine-grained details from the skipped connections in the encoder. Our approach addresses the dilemma that Transformer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiomics and Machine Learning in Medical Imaging · Advanced Neural Network Applications · COVID-19 diagnosis using AI
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam · Layer Normalization · Byte Pair Encoding · Dropout · Label Smoothing
