Dual-task Mutual Reinforcing Embedded Joint Video Paragraph Retrieval   and Grounding

Mengzhao Wang; Huafeng Li; Yafei Zhang; Jinxing Li; Minghong Xie,; Dapeng Tao

arXiv:2411.17481·cs.CV·November 27, 2024

Dual-task Mutual Reinforcing Embedded Joint Video Paragraph Retrieval and Grounding

Mengzhao Wang, Huafeng Li, Yafei Zhang, Jinxing Li, Minghong Xie,, Dapeng Tao

PDF

Open Access 1 Repo

TL;DR

This paper introduces DMR-JRG, a dual-task framework that jointly improves video paragraph retrieval and grounding by mutual reinforcement, reducing reliance on large-scale temporal annotations.

Contribution

It proposes a novel dual-branch method that combines coarse-grained retrieval with fine-grained grounding, leveraging inter-video contrastive learning and multi-dimensional feature consistency.

Findings

01

Reduces dependence on annotated temporal labels.

02

Achieves more accurate cross-modal matching.

03

Enhances retrieval and grounding performance.

Abstract

Video Paragraph Grounding (VPG) aims to precisely locate the most appropriate moments within a video that are relevant to a given textual paragraph query. However, existing methods typically rely on large-scale annotated temporal labels and assume that the correspondence between videos and paragraphs is known. This is impractical in real-world applications, as constructing temporal labels requires significant labor costs, and the correspondence is often unknown. To address this issue, we propose a Dual-task Mutual Reinforcing Embedded Joint Video Paragraph Retrieval and Grounding method (DMR-JRG). In this method, retrieval and grounding tasks are mutually reinforced rather than being treated as separate issues. DMR-JRG mainly consists of two branches: a retrieval branch and a grounding branch. The retrieval branch uses inter-video contrastive learning to roughly align the global…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

X7J92/DMR-JRG
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications

MethodsContrastive Learning · ALIGN