MM-OVSeg:Multimodal Optical-SAR Fusion for Open-Vocabulary Segmentation in Remote Sensing

Yimin Wei; Aoran Xiao; Hongruixuan Chen; Junshi Xia; Naoto Yokoya

arXiv:2603.17528·cs.CV·March 30, 2026

MM-OVSeg:Multimodal Optical-SAR Fusion for Open-Vocabulary Segmentation in Remote Sensing

Yimin Wei, Aoran Xiao, Hongruixuan Chen, Junshi Xia, Naoto Yokoya

PDF

1 Repo

TL;DR

MM-OVSeg introduces a multimodal fusion framework combining Optical and SAR data to improve open-vocabulary segmentation in remote sensing, especially under adverse weather conditions.

Contribution

It proposes a novel cross-modal unification and dual-encoder fusion approach to enhance robustness and generalization in multimodal remote sensing segmentation.

Findings

01

Achieves superior robustness across diverse cloud conditions.

02

Effectively aligns multi-sensor representations for improved segmentation.

03

Demonstrates strong generalization beyond fixed classes.

Abstract

Open-vocabulary segmentation enables pixel-level recognition from an open set of textual categories, allowing generalization beyond fixed classes. Despite great potential in remote sensing, progress in this area remains largely limited to clear-sky optical data and struggles under cloudy or haze-contaminated conditions. We present MM-OVSeg, a multimodal Optical-SAR fusion framework for resilient open-vocabulary segmentation under adverse weather conditions. MM-OVSeg leverages the complementary strengths of the two modalities--optical imagery provides rich spectral semantics, while synthetic aperture radar (SAR) offers cloud-penetrating structural cues. To address the cross-modal domain gap and the limited dense prediction capability of current vision-language models, we propose two key designs: a cross-modal unification process for multi-sensor representation alignment, and a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Jimmyxichen/MM-OVSeg
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.