3D-CSL: self-supervised 3D context similarity learning for   Near-Duplicate Video Retrieval

Rui Deng; Qian Wu; Yuke Li

arXiv:2211.05352·cs.CV·November 11, 2022·1 cites

3D-CSL: self-supervised 3D context similarity learning for Near-Duplicate Video Retrieval

Rui Deng, Qian Wu, Yuke Li

PDF

Open Access 1 Repo

TL;DR

This paper presents 3D-CSL, a self-supervised learning framework using a 3D transformer for efficient and effective near-duplicate video retrieval by capturing global spatiotemporal dependencies and employing a two-stage training strategy.

Contribution

Introduces a novel self-supervised learning pipeline with a 3D transformer and a two-stage training strategy for improved near-duplicate video retrieval.

Findings

01

Achieves state-of-the-art performance on FIVR-200K and CC_WEB_VIDEO datasets.

02

Demonstrates the effectiveness of global spatiotemporal dependency modeling.

03

Validates the superiority of the proposed self-supervised approach.

Abstract

In this paper, we introduce 3D-CSL, a compact pipeline for Near-Duplicate Video Retrieval (NDVR), and explore a novel self-supervised learning strategy for video similarity learning. Most previous methods only extract video spatial features from frames separately and then design kinds of complex mechanisms to learn the temporal correlations among frame features. However, parts of spatiotemporal dependencies have already been lost. To address this, our 3D-CSL extracts global spatiotemporal dependencies in videos end-to-end with a 3D transformer and find a good balance between efficiency and effectiveness by matching on clip-level. Furthermore, we propose a two-stage self-supervised similarity learning strategy to optimize the entire network. Firstly, we propose PredMAE to pretrain the 3D transformer with video prediction task; Secondly, ShotMix, a novel video-specific augmentation, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dun-research/3D-CSL
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Cancer-related molecular mechanisms research · Advanced Image and Video Retrieval Techniques