# MMFNet: A Mamba-Based Multimodal Fusion Network for Remote Sensing Image Semantic Segmentation

**Authors:** Jingting Qiu, Wei Chang, Wei Ren, Shanshan Hou, Ronghao Yang

PMC · DOI: 10.3390/s25196225 · 2025-10-08

## TL;DR

MMFNet is a new network for remote sensing image segmentation that combines optical and elevation data to improve accuracy and efficiency.

## Contribution

MMFNet introduces a dual-encoder Mamba-based architecture with a novel multimodal fusion block and frequency-aware upsampling for remote sensing.

## Key findings

- MMFNet achieved 83.50% mean IoU on the ISPRS Vaihingen benchmark.
- The model outperformed eight state-of-the-art methods with low computational complexity.
- The MFFB and FreqFusion modules improved boundary delineation and feature integration.

## Abstract

Accurate semantic segmentation of high-resolution remote sensing imagery is challenged by substantial intra-class variability, inter-class similarity, and the limitations of single-modality data. This paper proposes MMFNet, a novel multimodal fusion network that leverages the Mamba architecture to efficiently capture long-range dependencies for semantic segmentation tasks. MMFNet adopts a dual-encoder design, combining ResNet-18 for local detail extraction and VMamba for global contextual modelling, striking a balance between segmentation accuracy and computational efficiency. A Multimodal Feature Fusion Block (MFFB) is introduced to effectively integrate complementary information from optical imagery and digital surface models (DSMs), thereby enhancing multimodal feature interaction and improving segmentation accuracy. Furthermore, a frequency-aware upsampling module (FreqFusion) is incorporated in the decoder to enhance boundary delineation and recover fine spatial details. Extensive experiments on the ISPRS Vaihingen and Potsdam benchmarks demonstrate that MMFNet achieves mean IoU scores of 83.50% and 86.06%, outperforming eight state-of-the-art methods while maintaining relatively low computational complexity. These results highlight MMFNet’s potential for efficient and accurate multimodal semantic segmentation in remote sensing applications.

## Full-text entities

- **Diseases:** FreqFusion (MESH:D000069337), injury to (MESH:D014947)
- **Chemicals:** DSM (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Mutations:** E2M, 2D

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12526546/full.md

---
Source: https://tomesphere.com/paper/PMC12526546