Hybrid-Learning Video Moment Retrieval across Multi-Domain Labels

Weitong Cai; Jiabo Huang; Shaogang Gong

arXiv:2406.01791·cs.CV·June 5, 2024

Hybrid-Learning Video Moment Retrieval across Multi-Domain Labels

Weitong Cai, Jiabo Huang, Shaogang Gong

PDF

Open Access

TL;DR

This paper proposes a hybrid-learning approach for video moment retrieval that transfers knowledge from fully-supervised to weakly-supervised domains using a multi-branch model to improve retrieval accuracy without requiring extensive temporal annotations.

Contribution

It introduces EVA, a multi-branch video-text alignment model that enables knowledge transfer across domains with different label types for improved video moment retrieval.

Findings

01

EVA effectively leverages source domain annotations to enhance target domain retrieval.

02

The model achieves improved performance in weakly-labelled settings.

03

Cross-modal feature alignment enhances domain-invariant representations.

Abstract

Video moment retrieval (VMR) is to search for a visual temporal moment in an untrimmed raw video by a given text query description (sentence). Existing studies either start from collecting exhaustive frame-wise annotations on the temporal boundary of target moments (fully-supervised), or learn with only the video-level video-text pairing labels (weakly-supervised). The former is poor in generalisation to unknown concepts and/or novel scenes due to restricted dataset scale and diversity under expensive annotation costs; the latter is subject to visual-textual mis-correlations from incomplete labels. In this work, we introduce a new approach called hybrid-learning video moment retrieval to solve the problem by knowledge transfer through adapting the video-text matching relationships learned from a fully-supervised source domain to a weakly-labelled target domain when they do not share a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Advanced Image and Video Retrieval Techniques · Advanced Vision and Imaging