# M3-TransUNet: Medical Image Segmentation Based on Spatial Prior Attention and Multi-Scale Gating

**Authors:** Zhigao Zeng, Jiale Xiao, Shengqiu Yi, Qiang Liu, Yanhui Zhu

PMC · DOI: 10.3390/jimaging12010015 · 2025-12-29

## TL;DR

This paper introduces M3-TransUNet, a new model for medical image segmentation that improves accuracy and reduces errors using advanced attention mechanisms.

## Contribution

The novel M3-TransUNet architecture introduces three new modules for better multi-scale feature representation and spatial modeling in medical image segmentation.

## Key findings

- M3-TransUNet outperforms recent TransUNet variants on the Synapse dataset with an average DSC of 82.79%.
- The model significantly reduces the average HD95 from 19.74 mm to 10.21 mm on the Synapse dataset.
- Extensive experiments on public datasets demonstrate state-of-the-art performance for medical image segmentation.

## Abstract

Medical image segmentation presents substantial challenges arising from the diverse scales and morphological complexities of target anatomical structures. Although existing Transformer-based models excel at capturing global dependencies, they encounter critical bottlenecks in multi-scale feature representation, spatial relationship modeling, and cross-layer feature fusion. To address these limitations, we propose the M3-TransUNet architecture, which incorporates three key innovations: (1) MSGA (Multi-Scale Gate Attention) and MSSA (Multi-Scale Selective Attention) modules to enhance multi-scale feature representation; (2) ME-MSA (Manhattan Enhanced Multi-Head Self-Attention) to integrate spatial priors into self-attention computations, thereby overcoming spatial modeling deficiencies; and (3) MKGAG (Multi-kernel Gated Attention Gate) to optimize skip connections by precisely filtering noise and preserving boundary details. Extensive experiments on public datasets—including Synapse, CVC-ClinicDB, and ISIC—demonstrate that M3-TransUNet achieves state-of-the-art performance. Specifically, on the Synapse dataset, our model outperforms recent TransUNet variants such as J-CAPA, improving the average DSC to 82.79% (compared to 82.29%) and significantly reducing the average HD95 from 19.74 mm to 10.21 mm.

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12843401/full.md

---
Source: https://tomesphere.com/paper/PMC12843401