Zero-shot Video Moment Retrieval With Off-the-Shelf Models

Anuj Diwan; Puyuan Peng; Raymond J. Mooney

arXiv:2211.02178·cs.CV·November 7, 2022·1 cites

Zero-shot Video Moment Retrieval With Off-the-Shelf Models

Anuj Diwan, Puyuan Peng, Raymond J. Mooney

PDF

Open Access

TL;DR

This paper introduces a zero-shot method for Video Moment Retrieval that leverages off-the-shelf models without additional training, significantly outperforming previous zero-shot approaches and approaching supervised model performance.

Contribution

The paper presents a simple, three-step zero-shot approach for VMR using only off-the-shelf models, eliminating the need for finetuning or annotated data.

Findings

01

Outperforms previous zero-shot methods by at least 2.5x on all metrics

02

Reduces the gap between zero-shot and supervised models by over 74%

03

Outperforms non-pretrained supervised models on recall metrics and performs well on shorter moments

Abstract

For the majority of the machine learning community, the expensive nature of collecting high-quality human-annotated data and the inability to efficiently finetune very large state-of-the-art pretrained models on limited compute are major bottlenecks for building models for new tasks. We propose a zero-shot simple approach for one such task, Video Moment Retrieval (VMR), that does not perform any additional finetuning and simply repurposes off-the-shelf models trained on other tasks. Our three-step approach consists of moment proposal, moment-query matching and postprocessing, all using only off-the-shelf models. On the QVHighlights benchmark for VMR, we vastly improve performance of previous zero-shot approaches by at least 2.5x on all metrics and reduce the gap between zero-shot and state-of-the-art supervised by over 74%. Further, we also show that our zero-shot approach beats…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques