3D UX-Net: A Large Kernel Volumetric ConvNet Modernizing Hierarchical Transformer for Medical Image Segmentation
Ho Hin Lee, Shunxing Bao, Yuankai Huo, Bennett A. Landman

TL;DR
This paper introduces 3D UX-Net, a lightweight volumetric ConvNet that integrates large kernel convolutions and simplified transformer modules, achieving state-of-the-art results in 3D medical image segmentation.
Contribution
The paper presents a novel 3D ConvNet architecture that combines large kernel depth-wise convolutions with simplified transformer components, reducing parameters while maintaining high performance.
Findings
Outperforms SwinUNETR on multiple datasets
Achieves higher Dice scores in brain and abdominal segmentation
Demonstrates effective transfer learning capabilities
Abstract
The recent 3D medical ViTs (e.g., SwinUNETR) achieve the state-of-the-art performances on several 3D volumetric data benchmarks, including 3D medical image segmentation. Hierarchical transformers (e.g., Swin Transformers) reintroduced several ConvNet priors and further enhanced the practical viability of adapting volumetric segmentation in 3D medical datasets. The effectiveness of hybrid approaches is largely credited to the large receptive field for non-local self-attention and the large number of model parameters. In this work, we propose a lightweight volumetric ConvNet, termed 3D UX-Net, which adapts the hierarchical transformer using ConvNet modules for robust volumetric segmentation. Specifically, we revisit volumetric depth-wise convolutions with large kernel size (e.g. starting from ) to enable the larger global receptive fields, inspired by Swin Transformer. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMedical Imaging and Analysis · Advanced Neural Network Applications · Radiomics and Machine Learning in Medical Imaging
MethodsLarge convolutional kernels · Attention Is All You Need · Linear Layer · Label Smoothing · Multi-Head Attention · Adam · Dense Connections · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dropout
