Semantic Video Moments Retrieval at Scale: A New Task and a Baseline
Na Li

TL;DR
This paper introduces a new task called Semantic Video Moments Retrieval at scale (SVMR), which involves retrieving relevant videos and localizing specific clips within them, addressing challenges of semantic relevance and multiple query relevance.
Contribution
The paper proposes a novel two-stage baseline solution with an attention-based alignment framework and creates benchmark datasets for SVMR evaluation.
Findings
Our method outperforms existing solutions on new benchmarks.
The attention-based alignment improves clip localization accuracy.
The datasets facilitate comprehensive evaluation of SVMR models.
Abstract
Motivated by the increasing need of saving search effort by obtaining relevant video clips instead of whole videos, we propose a new task, named Semantic Video Moments Retrieval at scale (SVMR), which aims at finding relevant videos coupled with re-localizing the video clips in them. Instead of a simple combination of video retrieval and video re-localization, our task is more challenging because of several essential aspects. In the 1st stage, our SVMR should take into account the fact that: 1) a positive candidate long video can contain plenty of irrelevant clips which are also semantically meaningful. 2) a long video can be positive to two totally different query clips if it contains clips relevant to two queries. The 2nd re-localization stage also exhibits different assumptions from existing video re-localization tasks, which hold an assumption that the reference video must contain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Advanced Image and Video Retrieval Techniques
