# CVMFusion: ConvNeXtV2 and Visual Mamba Fusion for Remote Sensing Segmentation

**Authors:** Zelin Wang, Li Qin, Cheng Xu, Dexi Liu, Zeyu Guo, Yu Hu, Tianyu Yang

PMC · DOI: 10.3390/s26020640 · Sensors (Basel, Switzerland) · 2026-01-18

## TL;DR

CVMFusion is a new network that combines CNNs and Mamba for better sea–land segmentation in remote sensing data, improving accuracy for small targets and complex coastlines.

## Contribution

CVMFusion introduces a hybrid CNN-Mamba architecture with dynamic fusion modules for improved remote sensing segmentation.

## Key findings

- CVMFusion achieved MIoU accuracies of 98.05% and 96.28% on public SAR datasets.
- The model excels in segmenting small objects and intricate boundary regions.
- Dynamic fusion modules enhance accuracy by adaptively combining local and global features.

## Abstract

What are the main findings?
This paper presents CVMFusion, an innovative dual-branch network that cohesively combines ConvNeXtV2 for precise local feature extraction and VMamba for extensive contextual modelling, therefore setting a new benchmark for sea–land segmentation in remote sensing data.The proposed Dynamic Multi-scale Attention (DyMSA) and Dynamic Weighted Cross-Attention (DyWCA) modules enable dynamic, adaptive feature fusion, which is empirically shown to enhance the segmentation accuracy of small targets and complex coastline boundaries.

This paper presents CVMFusion, an innovative dual-branch network that cohesively combines ConvNeXtV2 for precise local feature extraction and VMamba for extensive contextual modelling, therefore setting a new benchmark for sea–land segmentation in remote sensing data.

The proposed Dynamic Multi-scale Attention (DyMSA) and Dynamic Weighted Cross-Attention (DyWCA) modules enable dynamic, adaptive feature fusion, which is empirically shown to enhance the segmentation accuracy of small targets and complex coastline boundaries.

What are the implications of the main findings?
The exceptional performance of CVMFusion on public SAR datasets illustrates the effectiveness of the hybrid CNN-Mamba architecture in addressing the shortcomings of current approaches, especially in managing class imbalance and retaining essential edge information.This work provides a robust and accurate tool for coastal zone monitoring, with direct implications for improving applications in marine disaster early warning, navigation safety, and sustainable coastal resource management.

The exceptional performance of CVMFusion on public SAR datasets illustrates the effectiveness of the hybrid CNN-Mamba architecture in addressing the shortcomings of current approaches, especially in managing class imbalance and retaining essential edge information.

This work provides a robust and accurate tool for coastal zone monitoring, with direct implications for improving applications in marine disaster early warning, navigation safety, and sustainable coastal resource management.

In recent years, extracting coastlines from high-resolution remote sensing imagery has proven difficult due to complex details and variable targets. Current methods struggle with the fact that CNNs cannot model long-range dependencies, while Transformers incur high computational costs. To address these issues, we propose CVMFusion: a land–sea segmentation network based on a U-shaped encoder–decoder structure, whereby both the encoder and decoder are hierarchically organized. This architecture integrates the local feature extraction capabilities of CNNs with the global interaction efficiency of Mamba. The encoder uses parallel ConvNeXtV2 and VMamba branches to capture fine-grained details and long-range context, respectively. This network incorporates Dynamic Multi-Scale Attention (DyMSA) and Dynamic Weighted Cross-Attention (DyWCA) modules, which replace the traditional concatenation with an adaptive fusion mechanism to effectively fuse the features from the dual-branch encoder and utilize skip connections to complete the fusion between the encoder and decoder. Experiments on two public datasets demonstrate that CVMFusion attained MIoU accuracies of 98.05% and 96.28%, outperforming existing methods. It performs particularly well in segmenting small objects and intricate boundary regions.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12845742/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12845742/full.md

## References

49 references — full list in the complete paper: https://tomesphere.com/paper/PMC12845742/full.md

---
Source: https://tomesphere.com/paper/PMC12845742