Aggregating Binary Local Descriptors for Image Retrieval
Giuseppe Amato, Fabrizio Falchi, Lucia Vadicamo

TL;DR
This paper compares aggregation methods for binary local descriptors in image retrieval, formalizes Fisher Kernels for Bernoulli models, and explores combining binary features with CNNs, showing efficiency and improved accuracy.
Contribution
It provides an extensive comparison of aggregation methods for binary features, formalizes Fisher Kernels for Bernoulli Mixture Models, and demonstrates the benefits of combining binary features with CNNs.
Findings
Aggregation methods are effective for binary features.
Fisher Vector on binary features improves retrieval performance.
Binary features combined with CNNs outperform standalone CNN features.
Abstract
Content-Based Image Retrieval based on local features is computationally expensive because of the complexity of both extraction and matching of local feature. On one hand, the cost for extracting, representing, and comparing local visual descriptors has been dramatically reduced by recently proposed binary local features. On the other hand, aggregation techniques provide a meaningful summarization of all the extracted feature of an image into a single descriptor, allowing us to speed up and scale up the image search. Only a few works have recently mixed together these two research directions, defining aggregation methods for binary local features, in order to leverage on the advantage of both approaches. In this paper, we report an extensive comparison among state-of-the-art aggregation methods applied to binary features. Then, we mathematically formalize the application of Fisher…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
