3D-CSL: self-supervised 3D context similarity learning for Near-Duplicate Video Retrieval
Rui Deng, Qian Wu, Yuke Li

TL;DR
This paper presents 3D-CSL, a self-supervised learning framework using a 3D transformer for efficient and effective near-duplicate video retrieval by capturing global spatiotemporal dependencies and employing a two-stage training strategy.
Contribution
Introduces a novel self-supervised learning pipeline with a 3D transformer and a two-stage training strategy for improved near-duplicate video retrieval.
Findings
Achieves state-of-the-art performance on FIVR-200K and CC_WEB_VIDEO datasets.
Demonstrates the effectiveness of global spatiotemporal dependency modeling.
Validates the superiority of the proposed self-supervised approach.
Abstract
In this paper, we introduce 3D-CSL, a compact pipeline for Near-Duplicate Video Retrieval (NDVR), and explore a novel self-supervised learning strategy for video similarity learning. Most previous methods only extract video spatial features from frames separately and then design kinds of complex mechanisms to learn the temporal correlations among frame features. However, parts of spatiotemporal dependencies have already been lost. To address this, our 3D-CSL extracts global spatiotemporal dependencies in videos end-to-end with a 3D transformer and find a good balance between efficiency and effectiveness by matching on clip-level. Furthermore, we propose a two-stage self-supervised similarity learning strategy to optimize the entire network. Firstly, we propose PredMAE to pretrain the 3D transformer with video prediction task; Secondly, ShotMix, a novel video-specific augmentation, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Cancer-related molecular mechanisms research · Advanced Image and Video Retrieval Techniques
