RS-GPT4V: A Unified Multimodal Instruction-Following Dataset for Remote Sensing Image Understanding
Linrui Xu, Ling Zhao, Wang Guo, Qiujun Li, Kewang Long, Kaiqi Zou,, Yuhan Wang, Haifeng Li

TL;DR
This paper introduces RS-GPT4V, a comprehensive multimodal dataset for remote sensing image understanding, designed to enhance generalization, scene understanding, and reasoning in AI models by leveraging GPT-4V generated data.
Contribution
The paper presents a new unified dataset, RS-GPT4V, created with GPT-4V, to improve remote sensing AI models' ability to generalize, understand complex scenes, and perform reasoning.
Findings
Fine-tuned models describe detailed scene attributes.
The dataset improves model generalization across tasks.
Models demonstrate reasoning capabilities with multi-turn QA.
Abstract
The remote sensing image intelligence understanding model is undergoing a new profound paradigm shift which has been promoted by multi-modal large language model (MLLM), i.e. from the paradigm learning a domain model (LaDM) shifts to paradigm learning a pre-trained general foundation model followed by an adaptive domain model (LaGD). Under the new LaGD paradigm, the old datasets, which have led to advances in RSI intelligence understanding in the last decade, are no longer suitable for fire-new tasks. We argued that a new dataset must be designed to lighten tasks with the following features: 1) Generalization: training model to learn shared knowledge among tasks and to adapt to different tasks; 2) Understanding complex scenes: training model to understand the fine-grained attribute of the objects of interest, and to be able to describe the scene with natural language; 3) Reasoning:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Machine Learning and Data Classification · Colorectal Cancer Screening and Detection
