Visual Instance Retrieval with Deep Convolutional Networks
Ali Sharif Razavian, Josephine Sullivan, Stefan Carlsson, Atsuto Maki

TL;DR
This paper investigates how convolutional neural network features can be optimized for visual instance retrieval, emphasizing multi-scale local features and geometric invariance to outperform existing methods.
Contribution
It introduces an efficient pipeline for extracting local features from ConvNets, incorporating multi-scale and geometric invariance considerations for improved retrieval performance.
Findings
ConvNet features outperform state-of-the-art methods when extracted properly
Multi-scale schemes and geometric invariance improve retrieval accuracy
Experiments on five standard datasets validate the approach
Abstract
This paper provides an extensive study on the availability of image representations based on convolutional networks (ConvNets) for the task of visual instance retrieval. Besides the choice of convolutional layers, we present an efficient pipeline exploiting multi-scale schemes to extract local features, in particular, by taking geometric invariance into explicit account, i.e. positions, scales and spatial consistency. In our experiments using five standard image retrieval datasets, we demonstrate that generic ConvNet image representations can outperform other state-of-the-art methods if they are extracted appropriately.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Multimodal Machine Learning Applications
