Zero-Shot Satellite Image Retrieval through Joint Embeddings: Application to Crisis Response
James Walsh, William Fawcett, Grace Colverd, Ra\'ul Ramos-Poll\'an

TL;DR
GeoQuery enables zero-shot satellite image retrieval using joint embeddings, facilitating crisis response by matching natural language queries with global Earth observation data without extensive training.
Contribution
The paper introduces a novel two-stage semantic and visual search system that leverages prompt-aligned proxies to enable zero-shot retrieval of satellite imagery for crisis response.
Findings
Achieved 31.6% accuracy within 50 km on disaster-location queries.
Strong performance on flood-related queries with 50% accuracy within 50 km.
Deployed in a crisis system, successfully identified vulnerable areas during Cyclone Alfred.
Abstract
Semantic search of Earth observation archives remains challenging. Visual foundation models such as CLAY produce rich embeddings of satellite imagery but lack the natural-language grounding needed for intuitive query, and full contrastive training of a remote-sensing CLIP-style model requires paired data and compute that are unavailable at global scale. To allow natural language querying at global scales, we present GeoQuery, a zero-shot retrieval system that sidesteps data and compute constraints through a two-stage semantic and visual search, leveraging a natural language embedding of a subset (proxy) of global data. Rather than training a joint encoder, we generate language descriptions for a 100k proxy subset of global Sentinel-2 tiles and optimise the description-generation prompt so that distances in the resulting text-embedding space correlate with distances in the frozen CLAY…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
