ChangeChat: An Interactive Model for Remote Sensing Change Analysis via Multimodal Instruction Tuning
Pei Deng, Wenqian Zhou, Hanlin Wu

TL;DR
ChangeChat is an innovative vision-language model tailored for remote sensing change analysis, enabling interactive, multimodal queries and surpassing existing methods in performance and versatility.
Contribution
It introduces the first bitemporal vision-language model for RS change analysis and develops a large, multimodal dataset for training and evaluation.
Findings
ChangeChat achieves state-of-the-art performance on change captioning and localization tasks.
The model outperforms GPT-4 on specific RS change analysis benchmarks.
The ChangeChat-87k dataset enhances model training and generalization.
Abstract
Remote sensing (RS) change analysis is vital for monitoring Earth's dynamic processes by detecting alterations in images over time. Traditional change detection excels at identifying pixel-level changes but lacks the ability to contextualize these alterations. While recent advancements in change captioning offer natural language descriptions of changes, they do not support interactive, user-specific queries. To address these limitations, we introduce ChangeChat, the first bitemporal vision-language model (VLM) designed specifically for RS change analysis. ChangeChat utilizes multimodal instruction tuning, allowing it to handle complex queries such as change captioning, category-specific quantification, and change localization. To enhance the model's performance, we developed the ChangeChat-87k dataset, which was generated using a combination of rule-based methods and GPT-assisted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeographic Information Systems Studies
MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Label Smoothing · Byte Pair Encoding · Absolute Position Encodings · Softmax · Layer Normalization · Position-Wise Feed-Forward Layer · Dropout
