LaVIDE: A Language-Vision Discriminator for Detecting Changes in   Satellite Image with Map References

Shuguo Jiang; Fang Xu; Sen Jia; Gui-Song Xia

arXiv:2411.19758·cs.CV·December 2, 2024

LaVIDE: A Language-Vision Discriminator for Detecting Changes in Satellite Image with Map References

Shuguo Jiang, Fang Xu, Sen Jia, Gui-Song Xia

PDF

Open Access

TL;DR

This paper introduces LaVIDE, a novel language-vision discriminator that effectively detects changes in satellite images by leveraging map references and high-level semantic information, outperforming existing methods.

Contribution

LaVIDE bridges the gap between map categories and satellite images using language-vision models, introducing a mixture-of-experts module for comprehensive semantic comparison in change detection.

Findings

01

Outperforms state-of-the-art algorithms on benchmark datasets

02

Achieves 13.8% improvement on DynamicEarthNet

03

Achieves 4.3% improvement on SECOND dataset

Abstract

Change detection, which typically relies on the comparison of bi-temporal images, is significantly hindered when only a single image is available. Comparing a single image with an existing map, such as OpenStreetMap, which is continuously updated through crowd-sourcing, offers a viable solution to this challenge. Unlike images that carry low-level visual details of ground objects, maps convey high-level categorical information. This discrepancy in abstraction levels complicates the alignment and comparison of the two data types. In this paper, we propose a \textbf{La}nguage-\textbf{VI}sion \textbf{D}iscriminator for d\textbf{E}tecting changes in satellite image with map references, namely \ours{}, which leverages language to bridge the information gap between maps and images. Specifically, \ours{} formulates change detection as the problem of ``{\textit Does the pixel belong to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRemote-Sensing Image Classification