MPerS: Dynamic MLLM MixExperts Perception-Guided Remote Sensing Scene Segmentation

Ziyi Wang; Xianping Ma; Ziyao Wang; Hongyang Zhang; and Man On Pun

arXiv:2605.10769·cs.CV·May 12, 2026

MPerS: Dynamic MLLM MixExperts Perception-Guided Remote Sensing Scene Segmentation

Ziyi Wang, Xianping Ma, Ziyao Wang, Hongyang Zhang, and Man On Pun

PDF

TL;DR

The paper introduces MPerS, a novel approach combining multimodal RS captions, diverse expert perceptions, and dynamic integration to improve remote sensing scene segmentation accuracy.

Contribution

It proposes a dynamic mixture-of-experts framework that leverages high-quality RS captions and multimodal perception for enhanced segmentation performance.

Findings

01

Achieves superior results on three public RS segmentation datasets.

02

Effectively integrates textual and visual features for precise segmentation.

03

Utilizes multiple LLMs and dense visual representations for comprehensive scene understanding.

Abstract

The multimodal fusion of images and scene captions has been extensively explored and applied in various fields. However, when dealing with complex remote sensing (RS) scenes, existing studies have predominantly concentrated on architectural optimizations for integrating textual semantic information with visual features, while largely neglecting the generation of high-quality RS captions and the investigation of their effectiveness in multimodal semantic fusion.In this context, we propose the Dynamic MLLM Mixture-of-Experts Perception-Guided Remote Sensing Scene Segmentation, referred to as MPerS.We design multiple prompts for MLLMs to generate high-quality RS captions, enabling MLLMs to perceive RS scenes from diverse expert perspectives. DINOv3 is employed to extract dense visual representations of land-covers.We design a Dynamic MixExperts module that adaptively integrates the most…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.