Segmentation-guided Attention for Visual Question Answering from Remote   Sensing Images

Lucrezia Tosato; Hichem Boussaid; Flora Weissgerber; Camille Kurtz,; Laurent Wendling; Sylvain Lobry

arXiv:2407.08669·cs.CV·July 12, 2024

Segmentation-guided Attention for Visual Question Answering from Remote Sensing Images

Lucrezia Tosato, Hichem Boussaid, Flora Weissgerber, Camille Kurtz,, Laurent Wendling, Sylvain Lobry

PDF

Open Access

TL;DR

This paper introduces a segmentation-guided attention mechanism for remote sensing visual question answering, leveraging segmentation to improve focus on relevant image regions, resulting in a 10% accuracy boost on a new high-resolution dataset.

Contribution

The novel integration of segmentation-guided attention into RSVQA models enhances visual focus and accuracy, supported by a new annotated dataset for evaluation.

Findings

01

Achieved nearly 10% higher accuracy than classical methods.

02

Developed a new dataset with segmentation annotations and question-answer pairs.

03

Demonstrated the effectiveness of segmentation-guided attention in remote sensing VQA.

Abstract

Visual Question Answering for Remote Sensing (RSVQA) is a task that aims at answering natural language questions about the content of a remote sensing image. The visual features extraction is therefore an essential step in a VQA pipeline. By incorporating attention mechanisms into this process, models gain the ability to focus selectively on salient regions of the image, prioritizing the most relevant visual information for a given question. In this work, we propose to embed an attention mechanism guided by segmentation into a RSVQA pipeline. We argue that segmentation plays a crucial role in guiding attention by providing a contextual understanding of the visual information, underlying specific objects or areas of interest. To evaluate this methodology, we provide a new VQA dataset that exploits very high-resolution RGB orthophotos annotated with 16 segmentation classes and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Image Retrieval and Classification Techniques

MethodsSoftmax · Attention Is All You Need · Focus