LMFNet: An Efficient Multimodal Fusion Approach for Semantic Segmentation in High-Resolution Remote Sensing
Tong Wang, Guanzhou Chen, Xiaodong Zhang, Chenxi Liu, Xiaoliang Tan,, Jiaqi Wang, Chanjuan He, Wenlin Zhou

TL;DR
LMFNet is a lightweight, multimodal fusion network that effectively combines RGB, NIR, and DSM data for high-resolution remote sensing semantic segmentation, achieving state-of-the-art accuracy with minimal parameters.
Contribution
The paper introduces LMFNet, a novel multimodal fusion architecture that simultaneously processes multiple remote sensing data types using a weight-sharing transformer and specialized fusion layers.
Findings
Achieves 85.09% mIoU on US3D dataset.
Outperforms existing methods with 10% higher mIoU.
Uses only 0.5M more parameters than unimodal models.
Abstract
Despite the rapid evolution of semantic segmentation for land cover classification in high-resolution remote sensing imagery, integrating multiple data modalities such as Digital Surface Model (DSM), RGB, and Near-infrared (NIR) remains a challenge. Current methods often process only two types of data, missing out on the rich information that additional modalities can provide. Addressing this gap, we propose a novel \textbf{L}ightweight \textbf{M}ultimodal data \textbf{F}usion \textbf{Net}work (LMFNet) to accomplish the tasks of fusion and semantic segmentation of multimodal remote sensing images. LMFNet uniquely accommodates various data types simultaneously, including RGB, NirRG, and DSM, through a weight-sharing, multi-branch vision transformer that minimizes parameter count while ensuring robust feature extraction. Our proposed multimodal fusion module integrates a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRemote-Sensing Image Classification · Image Retrieval and Classification Techniques · Remote Sensing and Land Use
MethodsAttention Is All You Need · Dense Connections · Residual Connection · Softmax · Layer Normalization · Linear Layer · Multi-Head Attention · Vision Transformer
