Remote Sensing SpatioTemporal Vision-Language Models: A Comprehensive Survey
Chenyang Liu, Jiafan Zhang, Keyan Chen, Man Wang, Zhengxia Zou, and Zhenwei Shi

TL;DR
This survey reviews the development of remote sensing spatio-temporal vision-language models, highlighting their ability to generate human-readable insights from multi-temporal imagery and discussing future research directions.
Contribution
It provides the first comprehensive overview of RS-STVLMs, covering model evolution, key tasks, components, datasets, and evaluation metrics, guiding future advancements in the field.
Findings
Models now integrate visual and linguistic data for richer analysis.
Progress from task-specific to foundation models leveraging large language models.
Identified key challenges and promising directions for future research.
Abstract
The interpretation of multi-temporal remote sensing imagery is critical for monitoring Earth's dynamic processes-yet previous change detection methods, which produce binary or semantic masks, fall short of providing human-readable insights into changes. Recent advances in Vision-Language Models (VLMs) have opened a new frontier by fusing visual and linguistic modalities, enabling spatio-temporal vision-language understanding: models that not only capture spatial and temporal dependencies to recognize changes but also provide a richer interactive semantic analysis of temporal images (e.g., generate descriptive captions and answer natural-language queries). In this survey, we present the first comprehensive review of RS-STVLMs. The survey covers the evolution of models from early task-specific models to recent general foundation models that leverage powerful large language models. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeographic Information Systems Studies · Advanced Image and Video Retrieval Techniques · Remote-Sensing Image Classification
MethodsFocus
