Landsat30-AU: A Vision-Language Dataset for Australian Landsat Imagery
Sai Ma, Zhuang Li, John A Taylor

TL;DR
Landsat30-AU is a large-scale, multi-decadal vision-language dataset for Australian Landsat imagery, designed to advance satellite image understanding and facilitate Earth observation applications.
Contribution
The paper introduces Landsat30-AU, a novel large-scale dataset combining image captions and VQA data from multi-satellite Landsat archives over 36 years, with a bootstrapped quality refinement pipeline.
Findings
Off-the-shelf VLMs perform poorly on satellite imagery.
Fine-tuning improves captioning and VQA accuracy significantly.
Landsat30-AU enables better understanding of satellite images.
Abstract
Vision language models (VLMs) that enable natural language interaction with satellite imagery can democratize Earth observation by accelerating expert workflows, making data accessible to non-specialists, and enabling planet-scale automation. However, existing datasets focus mainly on short-term, high-resolution imagery from a limited number of satellites, overlooking low-resolution, multi-satellite, long-term archives, such as Landsat, that are essential for affordable and bias-robust global monitoring. We address this gap with Landsat30-AU, a large-scale vision-language dataset built from 30-meter resolution imagery collected by four Landsat satellites (5, 7, 8, and 9) over Australia, spanning more than 36 years. The dataset includes two components: Landsat30-AU-Cap, containing image-caption pairs, and Landsat30-AU-VQA, comprising 17,725 human-verified visual question…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Advanced Image and Video Retrieval Techniques
