Checkmate: interpretable and explainable RSVQA is the endgame
Lucrezia Tosato, Christel Tartini Chappuis, Syrielle Montariol, Flora Weissgerber, Sylvain Lobry, Devis Tuia

TL;DR
This paper introduces a new RSVQA dataset called Chessboard and a novel model Checkmate that enhances interpretability and explainability, addressing biases and enabling fine-grained visual reasoning for more trustworthy remote sensing image analysis.
Contribution
The paper presents the Chessboard dataset to reduce biases and a new model Checkmate that provides interpretability and visual explanations in RSVQA tasks.
Findings
Checkmate improves model transparency and trustworthiness.
Chessboard dataset minimizes biases and supports detailed reasoning.
Extensive experiments validate the effectiveness of the approach.
Abstract
Remote Sensing Visual Question Answering (RSVQA) presents unique challenges in ensuring that model decisions are both understandable and grounded in visual content. Current models often suffer from a lack of interpretability and explainability, as well as from biases in dataset distributions that lead to shortcut learning. In this work, we tackle these issues by introducing a novel RSVQA dataset, Chessboard, designed to minimize biases through 3'123'253 questions and a balanced answer distribution. Each answer is linked to one or more cells within the image, enabling fine-grained visual reasoning. Building on this dataset, we develop an explainable and interpretable model called Checkmate that identifies the image cells most relevant to its decisions. Through extensive experiments across multiple model architectures, we show that our approach improves transparency and supports more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
