TVR-Ranking: A Dataset for Ranked Video Moment Retrieval with Imprecise Queries
Renjie Liang, Li Li, Chongzhi Zhang, Jing Wang, Xizhou Zhu, Aixin Sun

TL;DR
This paper introduces Ranked Video Moment Retrieval (RVMR), a new task with a dedicated dataset, TVR-Ranking, to improve the retrieval of relevant video segments based on natural language queries, addressing practical search challenges.
Contribution
The paper presents the TVR-Ranking dataset with relevance annotations for RVMR, along with a new evaluation metric and baseline experiments, advancing research in multi-modal video search.
Findings
Baseline models struggle with RVMR challenges.
The dataset reveals the complexity of imprecise natural language queries.
Evaluation metrics help quantify retrieval performance.
Abstract
In this paper, we propose the task of \textit{Ranked Video Moment Retrieval} (RVMR) to locate a ranked list of matching moments from a collection of videos, through queries in natural language. Although a few related tasks have been proposed and studied by CV, NLP, and IR communities, RVMR is the task that best reflects the practical setting of moment search. To facilitate research in RVMR, we develop the TVR-Ranking dataset, based on the raw videos and existing moment annotations provided in the TVR dataset. Our key contribution is the manual annotation of relevance levels for 94,442 query-moment pairs. We then develop the evaluation metric for this new task and conduct experiments to evaluate three baseline models. Our experiments show that the new RVMR task brings new challenges to existing models and we believe this new dataset contributes to the research on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Advanced Image and Video Retrieval Techniques
