The curse of language biases in remote sensing VQA: the role of spatial   attributes, language diversity, and the need for clear evaluation

Christel Chappuis; Eliot Walt; Vincent Mendez; Sylvain Lobry; and Bertrand Le Saux; Devis Tuia

arXiv:2311.16782·cs.CV·November 29, 2023·2 cites

The curse of language biases in remote sensing VQA: the role of spatial attributes, language diversity, and the need for clear evaluation

Christel Chappuis, Eliot Walt, Vincent Mendez, Sylvain Lobry, and Bertrand Le Saux, Devis Tuia

PDF

Open Access

TL;DR

This paper investigates the impact of language biases in remote sensing visual question answering (RSVQA), revealing that biases are more severe than in standard VQA due to dataset characteristics, and emphasizes the need for better datasets and evaluation metrics.

Contribution

The study highlights the prevalence of language biases in RSVQA, introduces analytical methods to expose these biases, and advocates for improved datasets and evaluation metrics.

Findings

01

Biases are more severe in RSVQA than in standard VQA.

02

Datasets' geographical and vocabulary characteristics contribute to biases.

03

Informed evaluation metrics are essential for transparent assessment.

Abstract

Remote sensing visual question answering (RSVQA) opens new opportunities for the use of overhead imagery by the general public, by enabling human-machine interaction with natural language. Building on the recent advances in natural language processing and computer vision, the goal of RSVQA is to answer a question formulated in natural language about a remote sensing image. Language understanding is essential to the success of the task, but has not yet been thoroughly examined in RSVQA. In particular, the problem of language biases is often overlooked in the remote sensing community, which can impact model robustness and lead to wrong conclusions about the performances of the model. Thus, the present work aims at highlighting the problem of language biases in RSVQA with a threefold analysis strategy: visual blind models, adversarial testing and dataset analysis. This analysis focuses…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsGravity