Landsat30-AU: A Vision-Language Dataset for Australian Landsat Imagery

Sai Ma; Zhuang Li; John A Taylor

arXiv:2508.03127·cs.CV·November 18, 2025

Landsat30-AU: A Vision-Language Dataset for Australian Landsat Imagery

Sai Ma, Zhuang Li, John A Taylor

PDF

Open Access 1 Video

TL;DR

Landsat30-AU is a large-scale, multi-decadal vision-language dataset for Australian Landsat imagery, designed to advance satellite image understanding and facilitate Earth observation applications.

Contribution

The paper introduces Landsat30-AU, a novel large-scale dataset combining image captions and VQA data from multi-satellite Landsat archives over 36 years, with a bootstrapped quality refinement pipeline.

Findings

01

Off-the-shelf VLMs perform poorly on satellite imagery.

02

Fine-tuning improves captioning and VQA accuracy significantly.

03

Landsat30-AU enables better understanding of satellite images.

Abstract

Vision language models (VLMs) that enable natural language interaction with satellite imagery can democratize Earth observation by accelerating expert workflows, making data accessible to non-specialists, and enabling planet-scale automation. However, existing datasets focus mainly on short-term, high-resolution imagery from a limited number of satellites, overlooking low-resolution, multi-satellite, long-term archives, such as Landsat, that are essential for affordable and bias-robust global monitoring. We address this gap with Landsat30-AU, a large-scale vision-language dataset built from 30-meter resolution imagery collected by four Landsat satellites (5, 7, 8, and 9) over Australia, spanning more than 36 years. The dataset includes two components: Landsat30-AU-Cap, containing $196, 262$ image-caption pairs, and Landsat30-AU-VQA, comprising 17,725 human-verified visual question…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Landsat30-AU: A Vision-Language Dataset for Australian Landsat Imagery· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Advanced Image and Video Retrieval Techniques