# Multi-level spatio-relational segformer (MLSRS-SegFormer): A novel vision transformer with adaptive spatial induction and dynamic positional encoding

**Authors:** Inda Rusdia Sofiani, Hadi Suyono, Erni Yudaningtyas, Fitri Utaminingrum

PMC · DOI: 10.1016/j.mex.2025.103693 · MethodsX · 2025-10-29

## TL;DR

This paper introduces a new vision transformer model for medical image segmentation that improves boundary precision and overall accuracy.

## Contribution

The paper proposes MLSRS-SegFormer with adaptive spatial induction and dynamic positional encoding for better segmentation.

## Key findings

- MLSRS-SegFormer achieves the highest mIoU (0.968) and mDSC (0.980) in segmentation accuracy.
- The model demonstrates the lowest HD95 (1.1668), indicating exceptional boundary precision.
- Bland-Altman analyses confirm its near-zero systematic bias and consistent performance.

## Abstract

Medical image segmentation is foundational to precision medicine. However, state-of-the-art Vision Transformers (ViTs) inherently suffer from a critical trade-off between comprehensive global contextualization and robust local boundary discrimination, especially in high-variance clinical data. This deficit necessitates a novel architecture. This study introduces the Multi-Level Spatio-Relational SegFormer (MLSRS-SegFormer), a novel vision transformer architecture designed to significantly enhance semantic segmentation through adaptive spatial induction strategies, dynamic positional encoding, and refined local context learning.

Our proposed Multi-Level Spatio-Relational SegFormer (MLSRS-SegFormer) model demonstrates significant architectural innovation, superior performance in comparative experiments, and robust validation for clinical applications, as summarized in the following key points:

• MLSRS-SegFormer integrates three clear and novel contributions beyond standard SegFormer: (1) Adaptive Patch Weighting in PatchEmbedding for dynamic feature induction, (2) Hausdorff-bias Attention for explicit spatial prioritization, and (3) Relative Positional Encoding (RPE) for nuanced and adaptive spatial relationship understanding.

• Comparative experiments reveal MLSRS-SegFormer's superior performance with consistent gains in segmentation accuracy, achieving the highest mIoU (0.968) and mDSC (0.980). Crucially for clinical applications, the model also demonstrates the lowest HD95 (1.1668), which validates its exceptional boundary precision.

• Bland-Altman analyses further confirm its near-zero systematic bias and remarkable consistency in area and boundary delineation, providing robust and highly accurate segmentation vital for clinical applications despite a longer inference time.

Image, graphical abstract

## Full-text entities

- **Diseases:** COVID-19 lesion (MESH:D000086382), cervical precancerous lesions (MESH:D011230), brain tumor (MESH:D001932), Cancer (MESH:D009369), lesion (MESH:D009059)
- **Chemicals:** IoU (-), Acetic Acid (MESH:D019342), Val (MESH:D014633)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12637274/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12637274/full.md

## References

27 references — full list in the complete paper: https://tomesphere.com/paper/PMC12637274/full.md

---
Source: https://tomesphere.com/paper/PMC12637274