Self-Supervised Video Hashing with Hierarchical Binary Auto-encoder
Jingkuan Song, Hanwang Zhang, Xiangpeng Li, Lianli Gao, Meng Wang and, Richang Hong

TL;DR
This paper introduces an unsupervised, end-to-end video hashing framework that captures temporal dependencies in videos using a hierarchical binary autoencoder, significantly improving retrieval performance.
Contribution
The novel hierarchical binary autoencoder effectively models temporal dependencies and jointly optimizes binary codes for content reconstruction and neighborhood preservation.
Findings
Outperforms state-of-the-art methods on FCVID and YFCC datasets.
Achieves the best performance in unsupervised video retrieval.
Reduces computational complexity compared to stacked architectures.
Abstract
Existing video hash functions are built on three isolated stages: frame pooling, relaxed learning, and binarization, which have not adequately explored the temporal order of video frames in a joint binary optimization model, resulting in severe information loss. In this paper, we propose a novel unsupervised video hashing framework dubbed Self-Supervised Video Hashing (SSVH), that is able to capture the temporal nature of videos in an end-to-end learning-to-hash fashion. We specifically address two central problems: 1) how to design an encoder-decoder architecture to generate binary codes for videos; and 2) how to equip the binary codes with the ability of accurate video retrieval. We design a hierarchical binary autoencoder to model the temporal dependencies in videos with multiple granularities, and embed the videos into binary codes with less computations than the stacked…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSolana Customer Service Number +1-833-534-1729
