Self-supervised Audiovisual Representation Learning for Remote Sensing   Data

Konrad Heidler; Lichao Mou; Di Hu; Pu Jin; Guangyao Li; Chuang Gan,; Ji-Rong Wen; Xiao Xiang Zhu

arXiv:2108.00688·cs.CV·August 22, 2024·6 cites

Self-supervised Audiovisual Representation Learning for Remote Sensing Data

Konrad Heidler, Lichao Mou, Di Hu, Pu Jin, Guangyao Li, Chuang Gan,, Ji-Rong Wen, Xiao Xiang Zhu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a self-supervised method for pre-training neural networks on remote sensing data by leveraging co-located audio and imagery, resulting in improved transfer learning performance without manual annotations.

Contribution

It presents a novel self-supervised approach using audiovisual correspondence for pre-training remote sensing models, along with the new SoundingEarth dataset.

Findings

01

Pre-trained models outperform existing strategies in remote sensing tasks.

02

The approach enables label-free pre-training using audiovisual data.

03

Models learn meaningful scene representations across modalities.

Abstract

Many current deep learning approaches make extensive use of backbone networks pre-trained on large datasets like ImageNet, which are then fine-tuned to perform a certain task. In remote sensing, the lack of comparable large annotated datasets and the wide diversity of sensing platforms impedes similar developments. In order to contribute towards the availability of pre-trained backbone networks in remote sensing, we devise a self-supervised approach for pre-training deep neural networks. By exploiting the correspondence between geo-tagged audio recordings and remote sensing imagery, this is done in a completely label-free manner, eliminating the need for laborious manual annotation. For this purpose, we introduce the SoundingEarth dataset, which consists of co-located aerial imagery and audio samples all around the world. Using this dataset, we then pre-train ResNet models to map…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

khdlr/SoundingEarth
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRemote-Sensing Image Classification · Speech and Audio Processing · Underwater Acoustics Research

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Convolution · Batch Normalization · Residual Connection · Average Pooling · Kaiming Initialization · 1x1 Convolution · Global Average Pooling · Residual Block · Bottleneck Residual Block