Visual descriptors for content-based retrieval of remote sensing images
Paolo Napoletano

TL;DR
This paper evaluates various visual descriptors, including CNN-based features, for content-based retrieval of remote sensing images, demonstrating CNN features outperform traditional hand-crafted descriptors across different datasets.
Contribution
It provides an extensive comparison of global, local, and CNN features for remote sensing image retrieval, highlighting the superior performance of CNN-based features, especially SatResNet-50 and NetVLAD.
Findings
CNN features outperform hand-crafted features in retrieval tasks
SatResNet-50 fine-tuned on RS data performs best among CNNs
NetVLAD excels with images containing fine textures and objects
Abstract
In this paper we present an extensive evaluation of visual descriptors for the content-based retrieval of remote sensing (RS) images. The evaluation includes global hand-crafted, local hand-crafted, and Convolutional Neural Network (CNNs) features coupled with four different Content-Based Image Retrieval schemes. We conducted all the experiments on two publicly available datasets: the 21-class UC Merced Land Use/Land Cover (LandUse) dataset and 19-class High-resolution Satellite Scene dataset (SceneSat). The content of RS images might be quite heterogeneous, ranging from images containing fine grained textures, to coarse grained ones or to images containing objects. It is therefore not obvious in this domain, which descriptor should be employed to describe images having such a variability. Results demonstrate that CNN-based features perform better than both global and and local…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
