Towards Temporal Change Explanations from Bi-Temporal Satellite Images
Ryo Tsujimoto, Hiroki Ouchi, Hidetaka Kamigaito, Taro Watanabe

TL;DR
This paper explores using large-scale vision-language models to explain temporal changes in satellite images, proposing prompting methods to handle image pairs and demonstrating the effectiveness of step-by-step reasoning prompts.
Contribution
It introduces three prompting methods for LVLMs to analyze bi-temporal satellite images and shows the effectiveness of step-by-step reasoning prompts through human evaluation.
Findings
Step-by-step reasoning prompts improve explanation quality.
LVLMs can be adapted for bi-temporal satellite image analysis.
Prompting methods enhance human-AI collaboration in change explanation.
Abstract
Explaining temporal changes between satellite images taken at different times is important for urban planning and environmental monitoring. However, manual dataset construction for the task is costly, so human-AI collaboration is promissing. Toward the direction, in this paper, we investigate the ability of Large-scale Vision-Language Models (LVLMs) to explain temporal changes between satellite images. While LVLMs are known to generate good image captions, they receive only a single image as input. To deal with a par of satellite images as input, we propose three prompting methods. Through human evaluation, we found the effectiveness of our step-by-step reasoning based prompting.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeochemistry and Geologic Mapping
