Segmentation-guided Attention for Visual Question Answering from Remote Sensing Images
Lucrezia Tosato, Hichem Boussaid, Flora Weissgerber, Camille Kurtz,, Laurent Wendling, Sylvain Lobry

TL;DR
This paper introduces a segmentation-guided attention mechanism for remote sensing visual question answering, leveraging segmentation to improve focus on relevant image regions, resulting in a 10% accuracy boost on a new high-resolution dataset.
Contribution
The novel integration of segmentation-guided attention into RSVQA models enhances visual focus and accuracy, supported by a new annotated dataset for evaluation.
Findings
Achieved nearly 10% higher accuracy than classical methods.
Developed a new dataset with segmentation annotations and question-answer pairs.
Demonstrated the effectiveness of segmentation-guided attention in remote sensing VQA.
Abstract
Visual Question Answering for Remote Sensing (RSVQA) is a task that aims at answering natural language questions about the content of a remote sensing image. The visual features extraction is therefore an essential step in a VQA pipeline. By incorporating attention mechanisms into this process, models gain the ability to focus selectively on salient regions of the image, prioritizing the most relevant visual information for a given question. In this work, we propose to embed an attention mechanism guided by segmentation into a RSVQA pipeline. We argue that segmentation plays a crucial role in guiding attention by providing a contextual understanding of the visual information, underlying specific objects or areas of interest. To evaluate this methodology, we provide a new VQA dataset that exploits very high-resolution RGB orthophotos annotated with 16 segmentation classes and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Image Retrieval and Classification Techniques
MethodsSoftmax · Attention Is All You Need · Focus
