VRFP: On-the-fly Video Retrieval using Web Images and Fast Fisher Vector Products
Xintong Han, Bharat Singh, Vlad I. Morariu, Larry S. Davis

TL;DR
This paper introduces VRFP, a real-time video retrieval system that uses web images and Fisher Vectors for efficient and accurate matching, outperforming existing methods on multiple datasets.
Contribution
The paper presents a novel real-time retrieval framework using Fisher Vectors built on CNN features from web images, with a fast matching algorithm for high-dimensional data.
Findings
Fisher Vectors are robust to noise in web images.
The proposed matching algorithm significantly speeds up inner product computations.
VRFP outperforms state-of-the-art methods on TRECVID MED13, MED14, and CCV datasets.
Abstract
VRFP is a real-time video retrieval framework based on short text input queries, which obtains weakly labeled training images from the web after the query is known. The retrieved web images representing the query and each database video are treated as unordered collections of images, and each collection is represented using a single Fisher Vector built on CNN features. Our experiments show that a Fisher Vector is robust to noise present in web images and compares favorably in terms of accuracy to other standard representations. While a Fisher Vector can be constructed efficiently for a new query, matching against the test set is slow due to its high dimensionality. To perform matching in real-time, we present a lossless algorithm that accelerates the inner product computation between high dimensional Fisher Vectors. We prove that the expected number of multiplications required decreases…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
