MD-RWKV-UNet: Scale-Aware Anatomical Encoding with Cross-Stage Fusion for Multi-Organ Segmentation
Zhuoyi Fang

TL;DR
The paper introduces MD-RWKV-UNet, a novel scale-aware encoder for multi-organ segmentation that dynamically adapts to local structures and enhances multi-scale feature interaction, achieving state-of-the-art results.
Contribution
It proposes a dynamic encoder with the MD-RWKV block and cross-stage dual-attention fusion, enabling adaptive, scale-aware, and context-rich feature extraction for medical image segmentation.
Findings
Achieves state-of-the-art performance on Synapse and ACDC datasets.
Improves boundary precision and small-organ segmentation accuracy.
Demonstrates robustness to organ size and shape variations.
Abstract
Multi-organ segmentation in medical imaging remains challenging due to large anatomical variability, complex inter-organ dependencies, and diverse organ scales and shapes. Conventional encoder-decoder architectures often struggle to capture both fine-grained local details and long-range context, which are crucial for accurate delineation - especially for small or deformable organs. To address these limitations, we propose MD-RWKV-UNet, a dynamic encoder network that enables scale-aware representation and spatially adaptive context modeling. At its core is the MD-RWKV block, a dual-path module that integrates deformable spatial shifts with the Receptance Weighted Key Value mechanism, allowing the receptive field to adapt dynamically to local structural cues. We further incorporate Selective Kernel Attention to enable adaptive selection of convolutional kernels with varying receptive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
