TL;DR
This paper introduces a scalable instance retrieval method using CNN local features encoded with a bag of words scheme, enabling efficient spatial reranking and query expansion for improved performance.
Contribution
It presents a novel CNN feature encoding approach with BoW that enhances instance search accuracy and efficiency, outperforming existing methods on key benchmarks.
Findings
Achieves competitive results on Oxford and Paris benchmarks.
Outperforms state-of-the-art sum pooling techniques on TRECVid INS.
Enables fast spatial reranking and object localization for better retrieval.
Abstract
This work proposes a simple instance retrieval pipeline based on encoding the convolutional features of CNN using the bag of words aggregation scheme (BoW). Assigning each local array of activations in a convolutional layer to a visual word produces an \textit{assignment map}, a compact representation that relates regions of an image with a visual word. We use the assignment map for fast spatial reranking, obtaining object localizations that are used for query expansion. We demonstrate the suitability of the BoW representation based on local CNN features for instance retrieval, achieving competitive performance on the Oxford and Paris buildings benchmarks. We show that our proposed system for CNN feature aggregation with BoW outperforms state-of-the-art techniques using sum pooling at a subset of the challenging TRECVid INS benchmark.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
