MIST: Medical Image Segmentation Transformer with Convolutional Attention Mixing (CAM) Decoder
Md Motiur Rahman, Shiva Shokouhmand, Smriti Bhatt, and Miad Faezipour

TL;DR
MIST is a novel medical image segmentation transformer that uses a Convolutional Attention Mixing decoder to better capture local and long-range pixel dependencies, outperforming existing models on key datasets.
Contribution
Introduces a CAM decoder that combines multi-head self-attention, spatial, and squeeze-and-excitation modules within a hierarchical transformer for improved segmentation.
Findings
Outperforms state-of-the-art models on ACDC and Synapse datasets.
Effective integration of low-level and high-level features enhances segmentation accuracy.
Hierarchical transformer with CAM decoder significantly improves performance.
Abstract
One of the common and promising deep learning approaches used for medical image segmentation is transformers, as they can capture long-range dependencies among the pixels by utilizing self-attention. Despite being successful in medical image segmentation, transformers face limitations in capturing local contexts of pixels in multimodal dimensions. We propose a Medical Image Segmentation Transformer (MIST) incorporating a novel Convolutional Attention Mixing (CAM) decoder to address this issue. MIST has two parts: a pre-trained multi-axis vision transformer (MaxViT) is used as an encoder, and the encoded feature representation is passed through the CAM decoder for segmenting the images. In the CAM decoder, an attention-mixer combining multi-head self-attention, spatial attention, and squeeze and excitation attention modules is introduced to capture long-range dependencies in all spatial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
MIST: Medical Image Segmentation Transformer With Convolutional Attention Mixing (CAM) Decoder· youtube
Taxonomy
TopicsRadiomics and Machine Learning in Medical Imaging · Advanced Neural Network Applications · AI in cancer detection
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Layer Normalization · Label Smoothing · Vision Transformer · Byte Pair Encoding · Dense Connections · Position-Wise Feed-Forward Layer
