When One Moment Isn't Enough: Multi-Moment Retrieval with Cross-Moment Interactions
Zhuo Cao, Heming Du, Bingqing Zhang, Xin Yu, Xue Li, Sen Wang

TL;DR
This paper introduces a new multi-moment retrieval dataset and a novel framework, FlashMMR, to better handle real-world video temporal grounding where multiple relevant moments exist per query.
Contribution
The paper presents QV-M$^2$, a high-quality dataset for multi-moment retrieval, and proposes FlashMMR, a framework with post-verification for improved multi-moment video grounding.
Findings
QV-M$^2$ effectively benchmarks multi-moment retrieval.
FlashMMR outperforms previous methods on QV-M$^2$.
Retraining existing methods improves their performance in multi-moment scenarios.
Abstract
Existing Moment retrieval (MR) methods focus on Single-Moment Retrieval (SMR). However, one query can correspond to multiple relevant moments in real-world applications. This makes the existing datasets and methods insufficient for video temporal grounding. By revisiting the gap between current MR tasks and real-world applications, we introduce a high-quality datasets called QVHighlights Multi-Moment Dataset (QV-M), along with new evaluation metrics tailored for multi-moment retrieval (MMR). QV-M consists of 2,212 annotations covering 6,384 video segments. Building on existing efforts in MMR, we propose a framework called FlashMMR. Specifically, we propose a Multi-moment Post-verification module to refine the moment boundaries. We introduce constrained temporal adjustment and subsequently leverage a verification module to re-evaluate the candidate segments. Through this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Video Analysis and Summarization
