Hybrid Transformer-Mamba Architecture for Weakly Supervised Volumetric Medical Segmentation
Yiheng Lyu, Lian Xu, Mohammed Bennamoun, Farid Boussaid, Coen Arrow, Girish Dwivedi

TL;DR
This paper introduces TranSamba, a hybrid Transformer-Mamba architecture that efficiently captures 3D context for weakly supervised volumetric medical segmentation, achieving state-of-the-art results.
Contribution
The paper presents a novel hybrid architecture combining Vision Transformer with Cross-Plane Mamba blocks for volumetric medical segmentation.
Findings
Outperforms existing methods across multiple datasets.
Achieves linear time complexity with respect to volume depth.
Maintains constant memory usage during batch processing.
Abstract
Weakly supervised semantic segmentation offers a label-efficient solution to train segmentation models for volumetric medical imaging. However, existing approaches often rely on 2D encoders that neglect the inherent volumetric nature of the data. We propose TranSamba, a hybrid Transformer-Mamba architecture designed to capture 3D context for weakly supervised volumetric medical segmentation. TranSamba augments a standard Vision Transformer backbone with Cross-Plane Mamba blocks, which leverage the linear complexity of state space models for efficient information exchange across neighboring slices. The information exchange enhances the pairwise self-attention within slices computed by the Transformer blocks, directly contributing to the attention maps for object localization. TranSamba achieves effective volumetric modeling with time complexity that scales linearly with the input volume…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · 3D Shape Modeling and Analysis · AI in cancer detection
