Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition

Di Hu; Xuhong Li; Lichao Mou; Pu Jin; Dong Chen; Liping Jing,; Xiaoxiang Zhu; Dejing Dou

arXiv:2005.08449·cs.CV·July 17, 2020·5 cites

Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition

Di Hu, Xuhong Li, Lichao Mou, Pu Jin, Dong Chen, Liping Jing,, Xiaoxiang Zhu, Dejing Dou

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces a novel audiovisual approach to aerial scene recognition, leveraging sound event knowledge to enhance visual classification accuracy, supported by a new dataset and three transfer learning methods.

Contribution

It proposes a new multimodal framework that transfers sound event knowledge to improve aerial scene recognition, along with a new dataset for evaluation.

Findings

01

Audio information improves scene recognition accuracy

02

Three transfer methods demonstrate effective knowledge transfer

03

New dataset ADVANCE supports multimodal aerial scene analysis

Abstract

Aerial scene recognition is a fundamental task in remote sensing and has recently received increased interest. While the visual information from overhead images with powerful models and efficient algorithms yields considerable performance on scene recognition, it still suffers from the variation of ground objects, lighting conditions etc. Inspired by the multi-channel perception theory in cognition science, in this paper, for improving the performance on the aerial scene recognition, we explore a novel audiovisual aerial scene recognition task using both images and sounds as input. Based on an observation that some specific sound events are more likely to be heard at a given geographic location, we propose to exploit the knowledge from the sound events to improve the performance on the aerial scene recognition. For this purpose, we have constructed a new dataset named AuDio Visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

DTaoo/Multimodal-Aerial-Scene-Recognition
pytorchOfficial

Datasets

blanchon/ADVANCE
dataset· 120 dl
120 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Advanced Image and Video Retrieval Techniques