# A New Encoding Architecture Based on Shift Multilayer Perceptron and Transformer for Medical Image Segmentation

**Authors:** Hepeng Zhong, Jieqiong Yang, Yingfei Wu, Jizheng Yi

PMC · DOI: 10.3390/s26020449 · Sensors (Basel, Switzerland) · 2026-01-09

## TL;DR

A new framework for medical image segmentation combines Shift MLP and Transformer to capture both local and global features, improving diagnostic accuracy.

## Contribution

Proposes a hybrid Shift MLP-Transformer architecture with SASPP and FAAM modules for enhanced medical image segmentation.

## Key findings

- Achieved 87.01% Dice coefficient on ACDC dataset and 79.35% on Synapse dataset.
- Improved feature representation through SASPP and FAAM modules.

## Abstract

What are the main findings?
A novel medical image segmentation framework integrating a Shift Multilayer Perceptron and a Transformer encoder is proposed, effectively capturing both low-level and long-range contextual dependencies.The incorporation of Senet Atrous Spatial Pyramid Pooling (SASPP) and the channel Feature Aggregation Attention Module (FAAM) enhances feature representation, achieving consistent improvements in Dice coefficients (87.01% on ACDC and 79.35% on Synapse) over state-of-the-art baselines.

A novel medical image segmentation framework integrating a Shift Multilayer Perceptron and a Transformer encoder is proposed, effectively capturing both low-level and long-range contextual dependencies.

The incorporation of Senet Atrous Spatial Pyramid Pooling (SASPP) and the channel Feature Aggregation Attention Module (FAAM) enhances feature representation, achieving consistent improvements in Dice coefficients (87.01% on ACDC and 79.35% on Synapse) over state-of-the-art baselines.

What are the implication of the main findings?
The proposed Multilayer Perceptron–Transformer (MPT) framework improves accuracy and generalization in multi-organ medical image segmentation, providing a robust foundation for clinical diagnosis and surgical planning.By optimizing feature fusion and mitigating information loss in U-shaped architectures, this work contributes to the evolution of Transformer–MLP hybrid models for efficient and precise medical image analysis.

The proposed Multilayer Perceptron–Transformer (MPT) framework improves accuracy and generalization in multi-organ medical image segmentation, providing a robust foundation for clinical diagnosis and surgical planning.

By optimizing feature fusion and mitigating information loss in U-shaped architectures, this work contributes to the evolution of Transformer–MLP hybrid models for efficient and precise medical image analysis.

Accurate medical image segmentation plays a crucial role in clinical diagnosis by precisely delineating diseased tissues and organs from various medical imaging modalities. However, existing segmentation methods often fail to effectively capture low-level structural details and exhibit inconsistencies in feature connection, which may compromise diagnostic reliability. To address these limitations, this study proposes a novel Multilayer Perceptron–Transformer encoding architecture that integrates the Shift Multilayer Perceptron and Transformer mechanisms. Specifically, a SENet-based Atrous Spatial Pyramid Pooling module is designed to extract multi-scale contextual representations, while the Shift MLP refines underlying spatial features. Moreover, a channel–feature aggregation attention module is introduced to strengthen information flow between the encoder and decoder layers. Experimental results on the Automatic Cardiac Diagnostic Challenge dataset show an average Dice Similarity Coefficient (DSC) of 87.01% (83.32% for the right ventricle, 90.90% for the left ventricle, and 86.83% for the myocardium). On the Synapse multi-organ segmentation dataset, the proposed model achieves an average DSC of 79.35% and a 95% Haus Dorff Distance of 20.07 mm. These results demonstrate that MPT effectively captures both local and global anatomical structures, providing reliable support for clinical diagnosis.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12845750/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12845750/full.md

## References

47 references — full list in the complete paper: https://tomesphere.com/paper/PMC12845750/full.md

---
Source: https://tomesphere.com/paper/PMC12845750