Semantics-Consistent Representation Learning for Remote Sensing   Image-Voice Retrieval

Hailong Ning; Bin Zhao; and Yuan Yuan

arXiv:2103.05302·cs.MM·January 5, 2022

Semantics-Consistent Representation Learning for Remote Sensing Image-Voice Retrieval

Hailong Ning, Bin Zhao, and Yuan Yuan

PDF

Open Access

TL;DR

This paper introduces a semantics-consistent representation learning method for remote sensing image-voice retrieval, effectively integrating intra- and inter-modality relationships to enhance cross-modal semantic matching.

Contribution

The novel SCRL method considers pairwise, intra-modality, and non-paired inter-modality relationships simultaneously, improving semantic consistency in RS image-voice retrieval.

Findings

01

Outperforms existing methods on three RS datasets

02

Effectively narrows the semantic gap between images and voices

03

Enhances retrieval accuracy through comprehensive relationship modeling

Abstract

With the development of earth observation technology, massive amounts of remote sensing (RS) images are acquired. To find useful information from these images, cross-modal RS image-voice retrieval provides a new insight. This paper aims to study the task of RS image-voice retrieval so as to search effective information from massive amounts of RS data. Existing methods for RS image-voice retrieval rely primarily on the pairwise relationship to narrow the heterogeneous semantic gap between images and voices. However, apart from the pairwise relationship included in the datasets, the intra-modality and non-paired inter-modality relationships should also be taken into account simultaneously, since the semantic consistency among non-paired representations plays an important role in the RS image-voice retrieval task. Inspired by this, a semantics-consistent representation learning (SCRL)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Multimodal Machine Learning Applications