Video Moment Retrieval from Text Queries via Single Frame Annotation

Ran Cui; Tianwen Qian; Pai Peng; Elena Daskalaki; Jingjing Chen,; Xiaowei Guo; Huyang Sun; Yu-Gang Jiang

arXiv:2204.09409·cs.CV·June 22, 2022

Video Moment Retrieval from Text Queries via Single Frame Annotation

Ran Cui, Tianwen Qian, Pai Peng, Elena Daskalaki, Jingjing Chen,, Xiaowei Guo, Huyang Sun, Yu-Gang Jiang

PDF

1 Repo

TL;DR

This paper introduces a new annotation paradigm called 'glance annotation' for video moment retrieval, requiring only a single frame timestamp, which improves performance over weak supervision and approaches fully supervised results.

Contribution

The paper proposes the 'glance annotation' paradigm and a contrastive learning method ViGA that leverages this minimal annotation to enhance video moment retrieval performance.

Findings

01

ViGA outperforms existing weakly supervised methods significantly.

02

Glance annotation reduces annotation cost while maintaining high retrieval accuracy.

03

ViGA achieves results comparable to fully supervised methods in some cases.

Abstract

Video moment retrieval aims at finding the start and end timestamps of a moment (part of a video) described by a given natural language query. Fully supervised methods need complete temporal boundary annotations to achieve promising results, which is costly since the annotator needs to watch the whole moment. Weakly supervised methods only rely on the paired video and query, but the performance is relatively poor. In this paper, we look closer into the annotation process and propose a new paradigm called "glance annotation". This paradigm requires the timestamp of only one single random frame, which we refer to as a "glance", within the temporal boundary of the fully supervised counterpart. We argue this is beneficial because comparing to weak supervision, trivial cost is added yet more potential in performance is provided. Under the glance annotation setting, we propose a method named…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

r-cui/ViGA
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.