No More Sliding Window: Efficient 3D Medical Image Segmentation with Differentiable Top-k Patch Sampling
Young Seok Jeon, Hongfei Yang, Huazhu Fu, Mengling Feng

TL;DR
This paper introduces NMSW, an end-to-end framework that eliminates sliding window inference in 3D medical image segmentation, significantly reducing computation and inference time while maintaining accuracy.
Contribution
The paper proposes a differentiable Top-k patch sampling method to replace sliding window inference, enabling faster and more efficient 3D segmentation without sacrificing accuracy.
Findings
Achieves 91% reduction in computational complexity
Delivers up to 11.1x faster inference on CPU
Maintains competitive segmentation accuracy
Abstract
3D models surpass 2D models in CT/MRI segmentation by effectively capturing inter-slice relationships. However, the added depth dimension substantially increases memory consumption. While patch-based training alleviates memory constraints, it significantly slows down the inference speed due to the sliding window (SW) approach. We propose No-More-Sliding-Window (NMSW), a novel end-to-end trainable framework that enhances the efficiency of generic 3D segmentation backbone during an inference step by eliminating the need for SW. NMSW employs a differentiable Top-k module to selectively sample only the most relevant patches, thereby minimizing redundant computations. When patch-level predictions are insufficient, the framework intelligently leverages coarse global predictions to refine results. Evaluated across 3 tasks using 3 segmentation backbones, NMSW achieves competitive accuracy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Image Segmentation Techniques · Medical Imaging Techniques and Applications · Computer Graphics and Visualization Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
