Dual-task Mutual Reinforcing Embedded Joint Video Paragraph Retrieval and Grounding
Mengzhao Wang, Huafeng Li, Yafei Zhang, Jinxing Li, Minghong Xie,, Dapeng Tao

TL;DR
This paper introduces DMR-JRG, a dual-task framework that jointly improves video paragraph retrieval and grounding by mutual reinforcement, reducing reliance on large-scale temporal annotations.
Contribution
It proposes a novel dual-branch method that combines coarse-grained retrieval with fine-grained grounding, leveraging inter-video contrastive learning and multi-dimensional feature consistency.
Findings
Reduces dependence on annotated temporal labels.
Achieves more accurate cross-modal matching.
Enhances retrieval and grounding performance.
Abstract
Video Paragraph Grounding (VPG) aims to precisely locate the most appropriate moments within a video that are relevant to a given textual paragraph query. However, existing methods typically rely on large-scale annotated temporal labels and assume that the correspondence between videos and paragraphs is known. This is impractical in real-world applications, as constructing temporal labels requires significant labor costs, and the correspondence is often unknown. To address this issue, we propose a Dual-task Mutual Reinforcing Embedded Joint Video Paragraph Retrieval and Grounding method (DMR-JRG). In this method, retrieval and grounding tasks are mutually reinforced rather than being treated as separate issues. DMR-JRG mainly consists of two branches: a retrieval branch and a grounding branch. The retrieval branch uses inter-video contrastive learning to roughly align the global…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
MethodsContrastive Learning · ALIGN
