RSGPT: A Remote Sensing Vision Language Model and Benchmark
Yuan Hu, Jianlong Yuan, Congcong Wen, Xiaonan Lu, Xiang Li

TL;DR
This paper introduces RSGPT, a vision language model for remote sensing, along with a new high-quality dataset RSICap and a benchmark RSIEval to advance large VLM development in remote sensing applications.
Contribution
The work presents a new high-quality dataset RSICap with detailed human-annotated captions and a benchmark RSIEval for evaluating remote sensing vision language models.
Findings
RSICap contains 2,585 detailed human-annotated captions.
RSIEval provides comprehensive evaluation data with captions and QA pairs.
The datasets facilitate training and benchmarking of large VLMs in remote sensing.
Abstract
The emergence of large-scale large language models, with GPT-4 as a prominent example, has significantly propelled the rapid advancement of artificial general intelligence and sparked the revolution of Artificial Intelligence 2.0. In the realm of remote sensing (RS), there is a growing interest in developing large vision language models (VLMs) specifically tailored for data analysis in this domain. However, current research predominantly revolves around visual recognition tasks, lacking comprehensive, large-scale image-text datasets that are aligned and suitable for training large VLMs, which poses significant challenges to effectively training such models for RS applications. In computer vision, recent research has demonstrated that fine-tuning large vision language models on small-scale, high-quality datasets can yield impressive performance in visual and language understanding. These…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsAttention Is All You Need · Softmax · Position-Wise Feed-Forward Layer · Layer Normalization · Linear Layer · Dense Connections · Label Smoothing · Dropout · Adam · Absolute Position Encodings
