Large Vision-Language Models for Remote Sensing Visual Question Answering
Surasakdi Siripong, Apirak Chaiyapan, Thanakorn Phonchai

TL;DR
This paper introduces a novel large vision-language model for remote sensing visual question answering, enabling more accurate and fluent natural language responses from satellite imagery without predefined answer categories.
Contribution
It presents a two-step training strategy for a generative LVLM tailored to remote sensing, improving over traditional methods in accuracy and relevance.
Findings
Outperforms state-of-the-art baselines on RSVQAxBEN dataset
Produces more accurate, relevant, and fluent answers according to human evaluation
Demonstrates the effectiveness of generative LVLMs in remote sensing analysis
Abstract
Remote Sensing Visual Question Answering (RSVQA) is a challenging task that involves interpreting complex satellite imagery to answer natural language questions. Traditional approaches often rely on separate visual feature extractors and language processing models, which can be computationally intensive and limited in their ability to handle open-ended questions. In this paper, we propose a novel method that leverages a generative Large Vision-Language Model (LVLM) to streamline the RSVQA process. Our approach consists of a two-step training strategy: domain-adaptive pretraining and prompt-based finetuning. This method enables the LVLM to generate natural language answers by conditioning on both visual and textual inputs, without the need for predefined answer categories. We evaluate our model on the RSVQAxBEN dataset, demonstrating superior performance compared to state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Image Retrieval and Classification Techniques
