Towards a multimodal framework for remote sensing image change retrieval and captioning
Roger Ferrod, Luigi Di Caro, Dino Ienco

TL;DR
This paper introduces a novel multimodal framework for remote sensing image change retrieval and captioning, utilizing contrastive learning to enhance change detection and captioning capabilities on bi-temporal RS data.
Contribution
It presents a new foundation model that combines contrastive learning with captioning for bi-temporal remote sensing images, addressing the gap in multimodal RS applications.
Findings
Model achieves competitive captioning performance.
Enhances text-image retrieval in change detection.
Utilizes LEVIR-CC dataset for training and evaluation.
Abstract
Recently, there has been increasing interest in multimodal applications that integrate text with other modalities, such as images, audio and video, to facilitate natural language interactions with multimodal AI systems. While applications involving standard modalities have been extensively explored, there is still a lack of investigation into specific data modalities such as remote sensing (RS) data. Despite the numerous potential applications of RS data, including environmental protection, disaster monitoring and land planning, available solutions are predominantly focused on specific tasks like classification, captioning and retrieval. These solutions often overlook the unique characteristics of RS data, such as its capability to systematically provide information on the same geographical areas over time. This ability enables continuous monitoring of changes in the underlying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Geographic Information Systems Studies
MethodsContrastive Learning
