DeltaVLM: Interactive Remote Sensing Image Change Analysis via Instruction-guided Difference Perception
Pei Deng, Wenqian Zhou, and Hanlin Wu

TL;DR
DeltaVLM introduces an interactive, instruction-guided framework for remote sensing image change analysis, combining change detection and visual question answering to enable multi-turn exploration of land-cover changes.
Contribution
The paper presents DeltaVLM, a novel architecture for interactive remote sensing change analysis, and introduces ChangeChat-105k, a large-scale dataset for instruction-following in this domain.
Findings
Achieves state-of-the-art performance on change captioning tasks.
Effectively supports multi-turn, instruction-guided change analysis.
Outperforms existing multimodal models in remote sensing change detection.
Abstract
Accurate interpretation of land-cover changes in multi-temporal satellite imagery is critical for real-world scenarios. However, existing methods typically provide only one-shot change masks or static captions, limiting their ability to support interactive, query-driven analysis. In this work, we introduce remote sensing image change analysis (RSICA) as a new paradigm that combines the strengths of change detection and visual question answering to enable multi-turn, instruction-guided exploration of changes in bi-temporal remote sensing images. To support this task, we construct ChangeChat-105k, a large-scale instruction-following dataset, generated through a hybrid rule-based and GPT-assisted process, covering six interaction types: change captioning, classification, quantification, localization, open-ended question answering, and multi-turn dialogues. Building on this dataset, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
