Sat2Cap: Mapping Fine-Grained Textual Descriptions from Satellite Images
Aayush Dhakal, Adeel Ahmad, Subash Khanal, Srikumar Sastry, Hannah, Kerner, Nathan Jacobs

TL;DR
Sat2Cap introduces a weakly supervised, zero-shot mapping method that uses contrastive learning on a large dataset to generate fine-grained textual descriptions from satellite images, enabling flexible and scalable geographic mapping.
Contribution
This work presents Sat2Cap, a contrastive learning framework trained on 6.1 million image pairs for zero-shot mapping of textual descriptions from satellite images, without requiring text labels.
Findings
Successfully captures ground-level concepts from satellite imagery.
Enables large-scale, fine-grained textual mapping.
Models temporally varying concepts over locations.
Abstract
We propose a weakly supervised approach for creating maps using free-form textual descriptions. We refer to this work of creating textual maps as zero-shot mapping. Prior works have approached mapping tasks by developing models that predict a fixed set of attributes using overhead imagery. However, these models are very restrictive as they can only solve highly specific tasks for which they were trained. Mapping text, on the other hand, allows us to solve a large variety of mapping problems with minimal restrictions. To achieve this, we train a contrastive learning framework called Sat2Cap on a new large-scale dataset with 6.1M pairs of overhead and ground-level images. For a given location and overhead image, our model predicts the expected CLIP embeddings of the ground-level scenery. The predicted CLIP embeddings are then used to learn about the textual space associated with that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsContrastive Learning · Contrastive Language-Image Pre-training
