UCTransNet: Rethinking the Skip Connections in U-Net from a Channel-wise Perspective with Transformer
Haonan Wang, Peng Cao, Jiaqi Wang, Osmar R. Zaiane

TL;DR
UCTransNet introduces a channel-wise transformer-based skip connection mechanism in U-Net, improving global context modeling and segmentation accuracy in medical images by addressing the limitations of traditional skip connections.
Contribution
The paper proposes a novel CTrans module with transformer-based multi-scale channel fusion and cross-attention, replacing traditional skip connections in U-Net for better semantic gap bridging.
Findings
Achieves more precise segmentation across multiple datasets.
Consistently outperforms state-of-the-art methods.
Improves global multi-scale context modeling in U-Net.
Abstract
Most recent semantic segmentation methods adopt a U-Net framework with an encoder-decoder architecture. It is still challenging for U-Net with a simple skip connection scheme to model the global multi-scale context: 1) Not each skip connection setting is effective due to the issue of incompatible feature sets of encoder and decoder stage, even some skip connection negatively influence the segmentation performance; 2) The original U-Net is worse than the one without any skip connection on some datasets. Based on our findings, we propose a new segmentation framework, named UCTransNet (with a proposed CTrans module in U-Net), from the channel perspective with attention mechanism. Specifically, the CTrans module is an alternate of the U-Net skip connections, which consists of a sub-module to conduct the multi-scale Channel Cross fusion with Transformer (named CCT) and a sub-module…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · COVID-19 diagnosis using AI · Radiomics and Machine Learning in Medical Imaging
MethodsMulti-Head Attention · Attention Is All You Need · *Communicated@Fast*How Do I Communicate to Expedia? · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Compact Convolutional Transformers · Average Pooling · Global Average Pooling · Linear Layer · Sigmoid Activation
