Co-Training Vision Language Models for Remote Sensing Multi-task Learning
Qingyun Li, Shuran Ma, Junwei Luo, Yi Yu, Yue Zhou, Fengxiang Wang, Xudong Lu, Xiaoxing Wang, Xin He, Yushi Chen, Xue Yang

TL;DR
This paper introduces RSCoVLM, a versatile vision-language model for remote sensing multi-task learning, featuring innovative data processing, dynamic resolution strategies, and a Zoom-in Chain mechanism, achieving state-of-the-art results across various tasks.
Contribution
The paper presents RSCoVLM, a flexible VLM baseline for RS MTL with novel data curation, dynamic resolution handling, and a Zoom-in Chain for ultra-high-resolution images, advancing multi-task remote sensing models.
Findings
Achieves state-of-the-art performance on multiple RS tasks.
Outperforms existing RS vision-language models.
Provides open-source tools and datasets for reproducibility.
Abstract
With Transformers achieving outstanding performance on individual remote sensing (RS) tasks, we are now approaching the realization of a unified model that excels across multiple tasks through multi-task learning (MTL). Compared to single-task approaches, MTL methods offer improved generalization, enhanced scalability, and greater practical applicability. Recently, vision language models (VLMs) have achieved promising results in RS image understanding, grounding, and ultra-high-resolution (UHR) image reasoning, respectively. Moreover, the unified text-based interface demonstrates significant potential for MTL. Hence, in this work, we present RSCoVLM, a simple yet flexible VLM baseline for RS MTL. Firstly, we create the data curation engine, including data acquisition, offline processing and integrating, as well as online loading and weighting. This data engine effectively addresses…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
