# ZR2ViM: a recursive vision Mamba model for boundary-preserving medical image segmentation

**Authors:** Caijian Hua, Caorong Xiang, Liuying Li, Xia Zhou

PMC · DOI: 10.3389/fbinf.2026.1768786 · Frontiers in Bioinformatics · 2026-03-04

## TL;DR

ZR2ViM is a new model for medical image segmentation that improves boundary accuracy and handles complex structures efficiently.

## Contribution

ZR2ViM introduces a recursion-enhanced visual state space model with novel mechanisms for boundary-preserving segmentation.

## Key findings

- ZR2ViM outperforms existing models in region consistency and boundary localization across multiple medical imaging domains.
- It achieves a 2.15 mm reduction in HD95 on the Synapse multi-organ CT dataset compared to CC-ViM.
- The model maintains near-linear computational complexity while enhancing fine structure representation.

## Abstract

Medical image segmentation is fundamental to quantitative disease analysis and therapeutic decision-making. However, constrained by limited computational resources, existing deep learning methods often struggle to simultaneously model long-range dependencies and preserve boundary precision, particularly when delineating structures with complex morphology or blurred edges.

To overcome these challenges, we propose 
ZR2
ViM, a recursion-enhanced visual state space model designed for medical image segmentation. 
ZR2
ViM augments the Vision Mamba framework with a Zigzag Recursive Reinforced (
ZR2
) Block that incorporates Stacked State Redistribution (SSR) and a Nested Recursive Connection (NRC). The NRC employs dual inner and outer pathways to iteratively fuse local details with global context while preserving 2D spatial adjacency. Furthermore, a Cross-directional Zigzag WKV (CZ-WKV) module executes multi-step recursive updates along multiple zigzag trajectories, injecting spatial directional information via Quad-Directional Token Shift (Q-Shift) directional priors. Collectively, these mechanisms mitigate serialization-induced banding artifacts and enhance the representation of fine, elongated, and low-contrast structures, all while maintaining near-linear computational complexity.

Comprehensive evaluations across four medical imaging domains—spanning dermatoscopic images, breast ultrasound, colorectal polyps, and abdominal multi-organ CT—on five public datasets demonstrate that 
ZR2
ViM consistently outperforms representative convolutional, attention-based, and visual state space architectures in region consistency and boundary localization. Notably, 
ZR2
ViM achieves a 2.15 mm reduction in the HD95 on the Synapse multi-organ CT dataset relative to the CC-ViM baseline, substantiating its superior capability for precise, clinically relevant boundary delineation.

The 
ZR2
ViM framework delivers accurate, boundary-preserving segmentation across diverse imaging modalities and anatomically complex structures, achieving these gains with near-linear computational complexity. These findings demonstrate that 
ZR2
ViM offers a robust and efficient solution for medical image analysis, establishing a promising foundation for advanced clinical and research applications.

## Full-text entities

- **Diseases:** colorectal polyps (MESH:D003111)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12996084/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12996084/full.md

## References

41 references — full list in the complete paper: https://tomesphere.com/paper/PMC12996084/full.md

---
Source: https://tomesphere.com/paper/PMC12996084