# CMIR-NET : A Deep Learning Based Model For Cross-Modal Retrieval In   Remote Sensing

**Authors:** Ushasi Chaudhuri, Biplab Banerjee, Avik Bhattacharya, and Mihai Datcu

arXiv: 1904.04794 · 2021-04-22

## TL;DR

This paper introduces CMIR-NET, a deep learning model designed for cross-modal retrieval in remote sensing, effectively handling diverse data types like images and speech annotations to improve retrieval accuracy.

## Contribution

The paper presents a novel deep neural network architecture that learns a shared feature space for different remote sensing modalities, enabling more effective cross-modal retrieval.

## Key findings

- Outperforms state-of-the-art methods on benchmark datasets.
- Effective in cross-modal retrieval between PAN and multi-spectral images.
- Successful multi-label retrieval using speech annotations.

## Abstract

We address the problem of cross-modal information retrieval in the domain of remote sensing. In particular, we are interested in two application scenarios: i) cross-modal retrieval between panchromatic (PAN) and multi-spectral imagery, and ii) multi-label image retrieval between very high resolution (VHR) images and speech based label annotations. Notice that these multi-modal retrieval scenarios are more challenging than the traditional uni-modal retrieval approaches given the inherent differences in distributions between the modalities. However, with the growing availability of multi-source remote sensing data and the scarcity of enough semantic annotations, the task of multi-modal retrieval has recently become extremely important. In this regard, we propose a novel deep neural network based architecture which is considered to learn a discriminative shared feature space for all the input modalities, suitable for semantically coherent information retrieval. Extensive experiments are carried out on the benchmark large-scale PAN - multi-spectral DSRSID dataset and the multi-label UC-Merced dataset. Together with the Merced dataset, we generate a corpus of speech signals corresponding to the labels. Superior performance with respect to the current state-of-the-art is observed in all the cases.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.04794/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1904.04794/full.md

## References

17 references — full list in the complete paper: https://tomesphere.com/paper/1904.04794/full.md

---
Source: https://tomesphere.com/paper/1904.04794