Multi-Query Video Retrieval
Zeyu Wang, Yu Wu, Karthik Narasimhan, Olga Russakovsky

TL;DR
This paper explores multi-query video retrieval, demonstrating that using multiple descriptions improves model evaluation, reduces dataset noise impact, and enhances retrieval performance and generalization in practical applications.
Contribution
It introduces the multi-query retrieval setting, showing its effectiveness in mitigating dataset noise and improving model training and evaluation.
Findings
Multi-query retrieval correlates better with human judgment.
Multi-query training improves model performance.
Mitigates dataset annotation noise.
Abstract
Retrieving target videos based on text descriptions is a task of great practical value and has received increasing attention over the past few years. Despite recent progress, imperfect annotations in existing video retrieval datasets have posed significant challenges on model evaluation and development. In this paper, we tackle this issue by focusing on the less-studied setting of multi-query video retrieval, where multiple descriptions are provided to the model for searching over the video archive. We first show that multi-query retrieval task effectively mitigates the dataset noise introduced by imperfect annotations and better correlates with human judgement on evaluating retrieval abilities of current models. We then investigate several methods which leverage multiple queries at training time, and demonstrate that the multi-query inspired training can lead to superior performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
