USEMA: a Scalable Efficient Mamba Like Attention for Medical Image Segmentation

Elisha Dayag; Nhat Thanh Tran; Jack Xin

arXiv:2605.11131·cs.CV·May 13, 2026

USEMA: a Scalable Efficient Mamba Like Attention for Medical Image Segmentation

Elisha Dayag, Nhat Thanh Tran, Jack Xin

PDF

TL;DR

USEMA introduces a hybrid UNet architecture combining local CNN features with a scalable, efficient Mamba-like attention mechanism to improve medical image segmentation performance and efficiency.

Contribution

The paper proposes USEMA, a novel hybrid architecture that integrates local CNN features with a scalable attention mechanism for better segmentation.

Findings

01

USEMA achieves higher segmentation accuracy than pure CNN and Mamba models.

02

USEMA demonstrates improved computational efficiency over transformer-based models.

03

USEMA performs well across various modalities and image sizes.

Abstract

Accurate medical image segmentation is an integral part of the medical image analysis pipeline that requires the ability to merge local and global information. While vision transformers are able to capture global interactions using vanilla self-attention, their quadratic computational complexity in the input size remains a struggle for medical image segmentation tasks. Motivated by the dispersion property of vanilla self-attention and recent development of Mamba form of attention, Scalable and Efficient Mamba like Attention (SEMA) utilizes token localization via local window attention to avoid dispersion and maintain focusing, complemented by theoretically consistent arithmetic averaging to capture global aspect of attention. In this work, we present USEMA, a hybrid UNet architecture that merges the local feature extraction ability of convolutional neural networks (CNNs) with SEMA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.