MomentDiff: Generative Video Moment Retrieval from Random to Real
Pandeng Li, Chen-Wei Xie, Hongtao Xie, Liming Zhao, Lei Zhang, Yun, Zheng, Deli Zhao, Yongdong Zhang

TL;DR
MomentDiff introduces a diffusion-based generative framework for video moment retrieval that refines random initial segments into accurate temporal boundaries, demonstrating superior performance and robustness against dataset biases.
Contribution
The paper presents a novel diffusion-based approach for video moment retrieval that resists dataset biases and improves generalization over existing methods.
Findings
Outperforms state-of-the-art on three benchmarks.
Shows robustness on anti-bias datasets with location distribution shifts.
Efficiently refines random segments into accurate video moments.
Abstract
Video moment retrieval pursues an efficient and generalized solution to identify the specific temporal segments within an untrimmed video that correspond to a given language description. To achieve this goal, we provide a generative diffusion-based framework called MomentDiff, which simulates a typical human retrieval process from random browsing to gradual localization. Specifically, we first diffuse the real span to random noise, and learn to denoise the random noise to the original span with the guidance of similarity between text and video. This allows the model to learn a mapping from arbitrary random locations to real moments, enabling the ability to locate segments from random initialization. Once trained, MomentDiff could sample random temporal segments as initial guesses and iteratively refine them to generate an accurate temporal boundary. Different from discriminative works…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Advanced Image and Video Retrieval Techniques
