SAR Strikes Back: A New Hope for RSVQA
Lucrezia Tosato, Flora Weissgerber, Laurent Wendling, Sylvain Lobry

TL;DR
This paper explores integrating SAR data into remote sensing visual question answering, proposing two models and fusion strategies, with the two-stage model and decision-level fusion outperforming others, especially for land cover questions.
Contribution
It introduces a new SAR-based RSVQA dataset, compares end-to-end and two-stage models, and evaluates fusion strategies, highlighting SAR's complementary value to optical imagery.
Findings
Two-stage model improves accuracy by nearly 10% over end-to-end.
Decision-level fusion achieves best performance with 75.49% accuracy.
SAR enhances land cover question answering, especially for water areas.
Abstract
Remote Sensing Visual Question Answering (RSVQA) is a task that extracts information from satellite images to answer questions in natural language, aiding image interpretation. While several methods exist for optical images with varying spectral bands and resolutions, only recently have high-resolution Synthetic Aperture Radar (SAR) images been explored. SAR's ability to operate in all weather conditions and capture electromagnetic features makes it a promising modality, yet no study has compared SAR and optical imagery in RSVQA or proposed effective fusion strategies. This work investigates how to integrate SAR data into RSVQA and how to best combine it with optical images. We present a dataset that enables SAR-based RSVQA and explore two pipelines for the task. The first is an end-to-end model, while the second is a two-stage framework: SAR information is first extracted and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDisaster Response and Management
