RSUniVLM: A Unified Vision Language Model for Remote Sensing via Granularity-oriented Mixture of Experts
Xu Liu, Zhouhui Lian

TL;DR
RSUniVLM is a comprehensive vision-language model for remote sensing that effectively handles multi-granularity tasks, multi-image analysis, and diverse RS applications with a novel architecture and large-scale dataset.
Contribution
The paper introduces RSUniVLM, a unified end-to-end remote sensing vision-language model with a novel granularity-oriented mixture of experts architecture and a large-scale instruction-following dataset.
Findings
Achieves state-of-the-art performance on various RS tasks.
Effectively handles multi-image analysis like change detection.
Maintains a compact model size of about 1 billion parameters.
Abstract
Remote Sensing Vision-Language Models (RS VLMs) have made much progress in the tasks of remote sensing (RS) image comprehension. While performing well in multi-modal reasoning and multi-turn conversations, the existing models lack pixel-level understanding and struggle with multi-image inputs. In this work, we propose RSUniVLM, a unified, end-to-end RS VLM designed for comprehensive vision understanding across multiple granularity, including image-level, region-level, and pixel-level tasks. RSUniVLM also performs effectively in multi-image analysis, with instances of change detection and change captioning. To enhance the model's ability to capture visual information at different levels without increasing model size, we design a novel architecture called Granularity-oriented Mixture of Experts to constraint the model to about 1 billion parameters. We also construct a large-scale RS…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Geographic Information Systems Studies · Advanced Computational Techniques and Applications
