RSGPT: A Remote Sensing Vision Language Model and Benchmark

Yuan Hu; Jianlong Yuan; Congcong Wen; Xiaonan Lu; Xiang Li

arXiv:2307.15266·cs.CV·July 31, 2023·36 cites

RSGPT: A Remote Sensing Vision Language Model and Benchmark

Yuan Hu, Jianlong Yuan, Congcong Wen, Xiaonan Lu, Xiang Li

PDF

Open Access 2 Repos

TL;DR

This paper introduces RSGPT, a vision language model for remote sensing, along with a new high-quality dataset RSICap and a benchmark RSIEval to advance large VLM development in remote sensing applications.

Contribution

The work presents a new high-quality dataset RSICap with detailed human-annotated captions and a benchmark RSIEval for evaluating remote sensing vision language models.

Findings

01

RSICap contains 2,585 detailed human-annotated captions.

02

RSIEval provides comprehensive evaluation data with captions and QA pairs.

03

The datasets facilitate training and benchmarking of large VLMs in remote sensing.

Abstract

The emergence of large-scale large language models, with GPT-4 as a prominent example, has significantly propelled the rapid advancement of artificial general intelligence and sparked the revolution of Artificial Intelligence 2.0. In the realm of remote sensing (RS), there is a growing interest in developing large vision language models (VLMs) specifically tailored for data analysis in this domain. However, current research predominantly revolves around visual recognition tasks, lacking comprehensive, large-scale image-text datasets that are aligned and suitable for training large VLMs, which poses significant challenges to effectively training such models for RS applications. In computer vision, recent research has demonstrated that fine-tuning large vision language models on small-scale, high-quality datasets can yield impressive performance in visual and language understanding. These…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsAttention Is All You Need · Softmax · Position-Wise Feed-Forward Layer · Layer Normalization · Linear Layer · Dense Connections · Label Smoothing · Dropout · Adam · Absolute Position Encodings