Frame-wise Cross-modal Matching for Video Moment Retrieval

Haoyu Tang; Jihua Zhu; Meng Liu; Zan Gao; and Zhiyong Cheng

arXiv:2009.10434·cs.CV·April 6, 2023

Frame-wise Cross-modal Matching for Video Moment Retrieval

Haoyu Tang, Jihua Zhu, Meng Liu, Zan Gao, and Zhiyong Cheng

PDF

1 Repo

TL;DR

This paper introduces an attentive cross-modal relevance matching model for video moment retrieval, improving localization accuracy by modeling interactions and emphasizing important query words.

Contribution

The paper proposes a novel ACRM model that enhances cross-modal interaction modeling and incorporates an internal frame predictor for better localization accuracy.

Findings

01

Outperforms state-of-the-art methods on TACoS and Charades-STA datasets.

02

Attention module effectively emphasizes semantically rich query words.

03

Additional internal frame predictor improves localization precision.

Abstract

Video moment retrieval targets at retrieving a moment in a video for a given language query. The challenges of this task include 1) the requirement of localizing the relevant moment in an untrimmed video, and 2) bridging the semantic gap between textual query and video contents. To tackle those problems, early approaches adopt the sliding window or uniform sampling to collect video clips first and then match each clip with the query. Obviously, these strategies are time-consuming and often lead to unsatisfied accuracy in localization due to the unpredictable length of the golden moment. To avoid the limitations, researchers recently attempt to directly predict the relevant moment boundaries without the requirement to generate video clips first. One mainstream approach is to generate a multimodal feature vector for the target query and video frames (e.g., concatenation) and then use a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tanghaoyu258/ACRM-for-moment-retrieval
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.