GMMFormer v2: An Uncertainty-aware Framework for Partially Relevant   Video Retrieval

Yuting Wang; Jinpeng Wang; Bin Chen; Tao Dai; Ruisheng Luo; Shu-Tao; Xia

arXiv:2405.13824·cs.CV·May 24, 2024

GMMFormer v2: An Uncertainty-aware Framework for Partially Relevant Video Retrieval

Yuting Wang, Jinpeng Wang, Bin Chen, Tao Dai, Ruisheng Luo, Shu-Tao, Xia

PDF

Open Access 1 Repo

TL;DR

GMMFormer v2 introduces an uncertainty-aware framework for partially relevant video retrieval, enhancing clip modeling and text-clip matching to improve accuracy and efficiency in retrieving relevant video segments without moment annotations.

Contribution

It proposes novel temporal consolidation and matching losses to address uncertainty and semantic collapse in PRVR, outperforming previous methods.

Findings

01

Significant performance improvements on three PRVR benchmarks.

02

Effective uncertainty modeling enhances clip perception and matching accuracy.

03

Versatile text-clip matching reduces semantic collapse.

Abstract

Given a text query, partially relevant video retrieval (PRVR) aims to retrieve untrimmed videos containing relevant moments. Due to the lack of moment annotations, the uncertainty lying in clip modeling and text-clip correspondence leads to major challenges. Despite the great progress, existing solutions either sacrifice efficiency or efficacy to capture varying and uncertain video moments. What's worse, few methods have paid attention to the text-clip matching pattern under such uncertainty, exposing the risk of semantic collapse. To address these issues, we present GMMFormer v2, an uncertainty-aware framework for PRVR. For clip modeling, we improve a strong baseline GMMFormer with a novel temporal consolidation module upon multi-scale contextual features, which maintains efficiency and improves the perception for varying moments. To achieve uncertainty-aware text-clip matching, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

huangmozhi9527/gmmformer_v2
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Multimodal Machine Learning Applications

MethodsContrastive Language-Image Pre-training