From LAION-5B to LAION-EO: Filtering Billions of Images Using Anchor Datasets for Satellite Image Extraction
Mikolaj Czerkawski, Alistair Francis

TL;DR
This paper introduces LAION-EO, a high-resolution satellite image dataset extracted from LAION-5B using an anchor-based filtering method, enabling domain-specific dataset creation from large web-sourced image collections.
Contribution
It presents a novel anchor-based filtering approach for extracting satellite images from large-scale datasets like LAION-5B, resulting in the LAION-EO dataset.
Findings
LAION-EO contains high-resolution satellite images with associated text.
The filtering method effectively isolates domain-specific satellite imagery.
The dataset facilitates research in satellite image analysis.
Abstract
Large datasets, such as LAION-5B, contain a diverse distribution of images shared online. However, extraction of domain-specific subsets of large image corpora is challenging. The extraction approach based on an anchor dataset, combined with further filtering, is proposed here and demonstrated for the domain of satellite imagery. This results in the release of LAION-EO, a dataset sourced from the web containing pairs of text and satellite images in high (pixel-wise) resolution. The paper outlines the acquisition procedure as well as some of the features of the dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques
