A Controlled Benchmark of Visual State-Space Backbones with Domain-Shift and Boundary Analysis for Remote-Sensing Segmentation

Nichula Wasalathilaka; Dineth Perera; Oshadha Samarakoon; Buddhi Wijenayake; Roshan Godaliyadda; Vijitha Herath; Parakrama Ekanayake

arXiv:2604.18721·eess.IV·April 22, 2026

A Controlled Benchmark of Visual State-Space Backbones with Domain-Shift and Boundary Analysis for Remote-Sensing Segmentation

Nichula Wasalathilaka, Dineth Perera, Oshadha Samarakoon, Buddhi Wijenayake, Roshan Godaliyadda, Vijitha Herath, Parakrama Ekanayake

PDF

TL;DR

This paper presents a controlled benchmark for visual state-space models in remote-sensing segmentation, analyzing encoder effects, generalization, and boundary accuracy under domain shifts.

Contribution

It introduces a unified benchmark isolating encoder effects, revealing insights into scaling, generalization asymmetry, and boundary failure modes for SSMs.

Findings

01

Encoder scaling yields modest gains.

02

Cross-domain generalization is asymmetric.

03

Boundary delineation is the main failure mode under shift.

Abstract

Visual state-space models (SSMs) are increasingly promoted as efficient alternatives to Vision Transformers, yet their practical advantages remain unclear under fair comparison because existing studies rarely isolate encoder effects from decoder and training choices. We present a strictly controlled benchmark of representative visual SSM families, including VMamba, MambaVision, and Spatial-Mamba, for remote-sensing semantic segmentation, in which only the encoder varies across experiments. Evaluated on LoveDA and ISPRS Potsdam under a unified 4-stage feature interface and a fixed lightweight decoder, the benchmark reveals three main findings, intra-family scaling yields only modest gains, cross-domain generalization is strongly asymmetric, and boundary delineation is the dominant failure mode under distribution shift. Although visual SSMs achieve favorable accuracy-efficiency trade-offs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.