TL;DR
This paper presents a fast, robust image-to-video search method that combines local and global descriptors, achieving state-of-the-art accuracy and efficiency on large-scale datasets by enhancing indexing and visual representation techniques.
Contribution
It introduces critical enhancements to indexing and visual representations, exploiting local and global descriptor decisions at query time for improved large-scale image-to-video retrieval.
Findings
Achieves state-of-the-art MAP scores on Stanford I2V dataset.
Reduces complexity and query time for large-scale retrieval.
Effectively combines local and global descriptors for diverse visual challenges.
Abstract
The cost-effective visual representation and fast query-by-example search are two challenging goals that should be maintained for web-scale visual retrieval tasks on moderate hardware. This paper introduces a fast and robust method that ensures both of these goals by obtaining state-of-the-art performance for an image-to-video search scenario. Hence, we present critical enhancements to well-known indexing and visual representation techniques by promoting faster, better and moderate retrieval performance. We also boost the superiority of our method for some visual challenges by exploiting individual decisions of local and global descriptors at query time. For instance, local content descriptors represent copied/duplicated scenes with large geometric deformations such as scale, orientation and affine transformation. In contrast, the use of global content descriptors is more practical for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
