Contrastive ground-level image and remote sensing pre-training improves   representation learning for natural world imagery

Andy V. Huynh; Lauren E. Gillespie; Jael Lopez-Saucedo; Claire Tang,; Rohan Sikand; Mois\'es Exp\'osito-Alonso

arXiv:2409.19439·cs.CV·October 1, 2024

Contrastive ground-level image and remote sensing pre-training improves representation learning for natural world imagery

Andy V. Huynh, Lauren E. Gillespie, Jael Lopez-Saucedo, Claire Tang,, Rohan Sikand, Mois\'es Exp\'osito-Alonso

PDF

Open Access

TL;DR

This paper introduces CRISP, a contrastive pre-training method utilizing ground-level and aerial images, which enhances species recognition performance by leveraging multiple views in natural world imagery.

Contribution

The paper presents a novel contrastive pre-training task for ground-level and aerial images, along with a large multi-view dataset for ecological species recognition.

Findings

01

Improved fine-grained classification accuracy for species recognition.

02

Effective use of multi-view contrastive learning with limited view availability.

03

Introduction of a new large-scale natural world imagery dataset.

Abstract

Multimodal image-text contrastive learning has shown that joint representations can be learned across modalities. Here, we show how leveraging multiple views of image data with contrastive learning can improve downstream fine-grained classification performance for species recognition, even when one view is absent. We propose ContRastive Image-remote Sensing Pre-training (CRISP) $\unicode x 2014$ a new pre-training task for ground-level and aerial image representation learning of the natural world $\unicode x 2014$ and introduce Nature Multi-View (NMV), a dataset of natural world imagery including $> 3$ million ground-level and aerial image pairs for over 6,000 plant taxa across the ecologically diverse state of California. The NMV dataset and accompanying material are available at hf.co/datasets/andyvhuynh/NatureMultiView.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques

MethodsContrastive Learning