UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation

Yunhe Gao; Mu Zhou; Dimitris Metaxas

arXiv:2107.00781·cs.CV·September 29, 2021·49 cites

UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation

Yunhe Gao, Mu Zhou, Dimitris Metaxas

PDF

Open Access 1 Repo

TL;DR

UTNet introduces a hybrid Transformer architecture that combines self-attention with convolutional neural networks to improve medical image segmentation, achieving superior performance without pre-training on a cardiac MRI dataset.

Contribution

The paper presents a novel hybrid Transformer architecture with an efficient self-attention mechanism and a new decoder, enabling effective medical image segmentation without pre-training.

Findings

01

UTNet outperforms state-of-the-art methods in cardiac MRI segmentation.

02

The model demonstrates robustness across multi-vendor datasets.

03

Self-attention modules improve long-range dependency capture at multiple scales.

Abstract

Transformer architecture has emerged to be successful in a number of natural language processing tasks. However, its applications to medical vision remain largely unexplored. In this study, we present UTNet, a simple yet powerful hybrid Transformer architecture that integrates self-attention into a convolutional neural network for enhancing medical image segmentation. UTNet applies self-attention modules in both encoder and decoder for capturing long-range dependency at different scales with minimal overhead. To this end, we propose an efficient self-attention mechanism along with relative position encoding that reduces the complexity of self-attention operation significantly from $O (n^{2})$ to approximate $O (n)$ . A new self-attention decoder is also proposed to recover fine-grained details from the skipped connections in the encoder. Our approach addresses the dilemma that Transformer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yhygao/UTNet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRadiomics and Machine Learning in Medical Imaging · Advanced Neural Network Applications · COVID-19 diagnosis using AI

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam · Layer Normalization · Byte Pair Encoding · Dropout · Label Smoothing