Text-conditioned State Space Model For Domain-generalized Change Detection Visual Question Answering

Elman Ghazaei; Erchan Aptoula

arXiv:2508.08974·cs.CV·October 27, 2025

Text-conditioned State Space Model For Domain-generalized Change Detection Visual Question Answering

Elman Ghazaei, Erchan Aptoula

PDF

TL;DR

This paper introduces a domain-generalized change detection visual question answering framework using a text-conditioned state space model, along with a new dataset, to improve robustness across diverse real-world scenarios.

Contribution

It proposes the TCSSM model for domain-invariant feature extraction and introduces BrightVQA, a multi-modal multi-domain dataset for CDVQA research.

Findings

01

TCSSM outperforms state-of-the-art models in experiments.

02

The new dataset facilitates domain generalization research.

03

Dynamic input-dependent parameters improve feature alignment.

Abstract

The Earth's surface is constantly changing, and detecting these changes provides valuable insights that benefit various aspects of human society. While traditional change detection methods have been employed to detect changes from bi-temporal images, these approaches typically require expert knowledge for accurate interpretation. To enable broader and more flexible access to change information by non-expert users, the task of Change Detection Visual Question Answering (CDVQA) has been introduced. However, existing CDVQA methods have been developed under the assumption that training and testing datasets share similar distributions. This assumption does not hold in real-world applications, where domain shifts often occur. In this paper, the CDVQA task is revisited with a focus on addressing domain shift. To this end, a new multi-modal and multi-domain dataset, BrightVQA, is introduced to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.