TerraScope: Pixel-Grounded Visual Reasoning for Earth Observation
Yan Shu, Bin Ren, Zhitong Xiong, Xiao Xiang Zhu, Beg\"um Demir, Nicu Sebe, Paolo Rota

TL;DR
TerraScope is a novel vision-language model designed for precise pixel-level geospatial reasoning in earth observation, capable of handling multiple modalities and temporal sequences, supported by a large dataset and benchmark.
Contribution
It introduces TerraScope, a unified model with modality-flexible and multi-temporal reasoning, along with the Terra-CoT dataset and TerraScope-Bench for evaluation.
Findings
Outperforms existing VLMs in pixel-grounded geospatial reasoning
Provides interpretable visual evidence
Achieves high accuracy on multiple sub-tasks
Abstract
Vision-language models (VLMs) have shown promise in earth observation (EO), yet they struggle with tasks that require grounding complex spatial reasoning in precise pixel-level visual representations. To address this problem, we introduce TerraScope, a unified VLM that delivers pixel-grounded geospatial reasoning with two key capabilities: (1) modality-flexible reasoning: it handles single-modality inputs (optical or SAR) and adaptively fuses different modalities into the reasoning process when both are available; (2) multi-temporal reasoning: it integrates temporal sequences for change analysis across multiple time points. In addition, we curate Terra-CoT, a large-scale dataset containing 1 million samples with pixel-level masks embedded in reasoning chains across multiple sources. We also propose TerraScope-Bench, the first benchmark for pixel-grounded geospatial reasoning with six…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Constraint Satisfaction and Optimization · Geographic Information Systems Studies
