Decoding the Delta: Unifying Remote Sensing Change Detection and Understanding with Multimodal Large Language Models

Xiaohe Li; Jiahao Li; Kaixin Zhang; Yuqiang Fang; Leilei Lin; Hong Wang; Haohua Wu; Zide Fan

arXiv:2604.14044·cs.CV·April 16, 2026

Decoding the Delta: Unifying Remote Sensing Change Detection and Understanding with Multimodal Large Language Models

Xiaohe Li, Jiahao Li, Kaixin Zhang, Yuqiang Fang, Leilei Lin, Hong Wang, Haohua Wu, Zide Fan

PDF

TL;DR

This paper introduces Delta-LLaVA, a specialized multimodal large language model for remote sensing change detection, supported by a new comprehensive benchmark, Delta-QA, that advances multi-temporal interpretation capabilities.

Contribution

The work presents Delta-LLaVA with novel modules for change detection and a new benchmark, unifying pixel-level segmentation and question answering in remote sensing.

Findings

01

Delta-LLaVA outperforms existing models in change deduction accuracy.

02

The Delta-QA benchmark enables comprehensive evaluation of multi-temporal remote sensing tasks.

03

Proposed modules effectively isolate and amplify visual differences across time.

Abstract

While Multimodal Large Language Models (MLLMs) excel in general vision-language tasks, their application to remote sensing change understanding is hindered by a fundamental "temporal blindness". Existing architectures lack intrinsic mechanisms for multi-temporal contrastive reasoning and struggle with precise spatial grounding. To address this, we first introduce Delta-QA, a comprehensive benchmark comprising 180k visual question-answering samples. Delta-QA unifies pixel-level segmentation and visual question answering across bi- and tri-temporal scenarios, structuring change interpretation into four progressive cognitive dimensions. Methodologically, we propose Delta-LLaVA, a novel MLLM framework explicitly tailored for multi-temporal remote sensing interpretation. It overcomes the limitations of naive feature concatenation through three core innovations: a Change-Enhanced Attention…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.