Swin SMT: Global Sequential Modeling in 3D Medical Image Segmentation
Szymon P{\l}otka, Maciej Chrabaszcz, Przemyslaw Biecek

TL;DR
Swin SMT introduces a novel transformer-based architecture with Soft MoE for improved 3D medical image segmentation, effectively capturing diverse long-range dependencies in whole-body CT scans.
Contribution
The paper proposes Swin SMT, combining Swin UNETR with Soft MoE to enhance global and local feature modeling in 3D segmentation tasks.
Findings
Outperforms state-of-the-art methods in 3D segmentation
Achieves an average Dice score of 85.09% on WBCT data
Efficiently handles complex long-range dependencies
Abstract
Recent advances in Vision Transformers (ViTs) have significantly enhanced medical image segmentation by facilitating the learning of global relationships. However, these methods face a notable challenge in capturing diverse local and global long-range sequential feature representations, particularly evident in whole-body CT (WBCT) scans. To overcome this limitation, we introduce Swin Soft Mixture Transformer (Swin SMT), a novel architecture based on Swin UNETR. This model incorporates a Soft Mixture-of-Experts (Soft MoE) to effectively handle complex and diverse long-range dependencies. The use of Soft MoE allows for scaling up model parameters maintaining a balance between computational complexity and segmentation performance in both training and inference modes. We evaluate Swin SMT on the publicly available TotalSegmentator-V2 dataset, which includes 117 major anatomical structures…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Image Segmentation Techniques
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Attention Is All You Need · Max Pooling · Concatenated Skip Connection · Softmax · Residual Connection · U-Net · Byte Pair Encoding · 1x1 Convolution · Layer Normalization
