Rethinking Scanning Strategies with Vision Mamba in Semantic Segmentation of Remote Sensing Imagery: An Experimental Study
Qinfeng Zhu, Yuan Fang, Yuanzhi Cai, Cheng Chen, Lei Fan

TL;DR
This study investigates various image scanning strategies for the Mamba model in remote sensing image segmentation, finding that a simple single-direction scan is as effective as complex multi-directional approaches.
Contribution
It provides a comprehensive experimental analysis of scanning strategies, demonstrating that complex strategies do not outperform simple single-direction scans in remote sensing segmentation tasks.
Findings
No single scanning strategy outperforms others.
Simple single-direction scanning is sufficient.
Complex strategies do not improve segmentation performance.
Abstract
Deep learning methods, especially Convolutional Neural Networks (CNN) and Vision Transformer (ViT), are frequently employed to perform semantic segmentation of high-resolution remotely sensed images. However, CNNs are constrained by their restricted receptive fields, while ViTs face challenges due to their quadratic complexity. Recently, the Mamba model, featuring linear complexity and a global receptive field, has gained extensive attention for vision tasks. In such tasks, images need to be serialized to form sequences compatible with the Mamba model. Numerous research efforts have explored scanning strategies to serialize images, aiming to enhance the Mamba model's understanding of images. However, the effectiveness of these scanning strategies remains uncertain. In this research, we conduct a comprehensive experimental investigation on the impact of mainstream scanning directions and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRemote-Sensing Image Classification · Advanced Image and Video Retrieval Techniques · Satellite Image Processing and Photogrammetry
MethodsPosition-Wise Feed-Forward Layer · Dropout · Label Smoothing · Absolute Position Encodings · Byte Pair Encoding · Adam · Softmax · Attention Is All You Need · Layer Normalization · Linear Layer
