Vectors of Locally Aggregated Centers for Compact Video Representation
Alhabib Abbas, Nikos Deligiannis, Yiannis Andreopoulos

TL;DR
This paper introduces VLAC, a new vector aggregation method that enhances compact video representations for more accurate similarity detection, especially under visual distortions, outperforming VLAD and hyper-pooling in experiments.
Contribution
The paper presents VLAC, a novel coarser-level vector aggregation technique that improves robustness and accuracy of compact video representations for similarity detection.
Findings
VLAC outperforms VLAD and hyper-pooling in mean Average Precision.
VLAC provides more robust video similarity detection under visual distortions.
VLAC achieves significant gains with the same level of data compaction.
Abstract
We propose a novel vector aggregation technique for compact video representation, with application in accurate similarity detection within large video datasets. The current state-of-the-art in visual search is formed by the vector of locally aggregated descriptors (VLAD) of Jegou et. al. VLAD generates compact video representations based on scale-invariant feature transform (SIFT) vectors (extracted per frame) and local feature centers computed over a training set. With the aim to increase robustness to visual distortions, we propose a new approach that operates at a coarser level in the feature representation. We create vectors of locally aggregated centers (VLAC) by first clustering SIFT features to obtain local feature centers (LFCs) and then encoding the latter with respect to given centers of local feature centers (CLFCs), extracted from a training set. The sum-of-differences…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
