3D UX-Net: A Large Kernel Volumetric ConvNet Modernizing Hierarchical   Transformer for Medical Image Segmentation

Ho Hin Lee; Shunxing Bao; Yuankai Huo; Bennett A. Landman

arXiv:2209.15076·cs.CV·March 3, 2023·100 cites

3D UX-Net: A Large Kernel Volumetric ConvNet Modernizing Hierarchical Transformer for Medical Image Segmentation

Ho Hin Lee, Shunxing Bao, Yuankai Huo, Bennett A. Landman

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces 3D UX-Net, a lightweight volumetric ConvNet that integrates large kernel convolutions and simplified transformer modules, achieving state-of-the-art results in 3D medical image segmentation.

Contribution

The paper presents a novel 3D ConvNet architecture that combines large kernel depth-wise convolutions with simplified transformer components, reducing parameters while maintaining high performance.

Findings

01

Outperforms SwinUNETR on multiple datasets

02

Achieves higher Dice scores in brain and abdominal segmentation

03

Demonstrates effective transfer learning capabilities

Abstract

The recent 3D medical ViTs (e.g., SwinUNETR) achieve the state-of-the-art performances on several 3D volumetric data benchmarks, including 3D medical image segmentation. Hierarchical transformers (e.g., Swin Transformers) reintroduced several ConvNet priors and further enhanced the practical viability of adapting volumetric segmentation in 3D medical datasets. The effectiveness of hybrid approaches is largely credited to the large receptive field for non-local self-attention and the large number of model parameters. In this work, we propose a lightweight volumetric ConvNet, termed 3D UX-Net, which adapts the hierarchical transformer using ConvNet modules for robust volumetric segmentation. Specifically, we revisit volumetric depth-wise convolutions with large kernel size (e.g. starting from $7 \times 7 \times 7$ ) to enable the larger global receptive fields, inspired by Swin Transformer. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

3D UX-Net: A Large Kernel Volumetric ConvNet Modernizing Hierarchical Transformer for Medical Image Segmentation· slideslive

Taxonomy

TopicsMedical Imaging and Analysis · Advanced Neural Network Applications · Radiomics and Machine Learning in Medical Imaging

MethodsLarge convolutional kernels · Attention Is All You Need · Linear Layer · Label Smoothing · Multi-Head Attention · Adam · Dense Connections · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dropout