TL;DR
The U-Transformer network enhances medical image segmentation by integrating self- and cross-attention mechanisms into a U-shaped architecture, effectively modeling long-range dependencies and improving accuracy over traditional U-Nets.
Contribution
This paper introduces a novel U-Transformer architecture that combines Transformers with U-Net for improved modeling of spatial dependencies in medical image segmentation.
Findings
Significant performance improvements over U-Net and Attention U-Nets.
Both self- and cross-attention are crucial for optimal results.
Enhanced interpretability of segmentation results.
Abstract
Medical image segmentation remains particularly challenging for complex and low-contrast anatomical structures. In this paper, we introduce the U-Transformer network, which combines a U-shaped architecture for image segmentation with self- and cross-attention from Transformers. U-Transformer overcomes the inability of U-Nets to model long-range contextual interactions and spatial dependencies, which are arguably crucial for accurate segmentation in challenging contexts. To this end, attention mechanisms are incorporated at two main levels: a self-attention module leverages global interactions between encoder features, while cross-attention in the skip connections allows a fine spatial recovery in the U-Net decoder by filtering out non-semantic features. Experiments on two abdominal CT-image datasets show the large performance gain brought out by U-Transformer compared to U-Net and local…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsConcatenated Skip Connection · *Communicated@Fast*How Do I Communicate to Expedia? · Convolution · Max Pooling · U-Net
