Retrieving Any Relevant Moments: Benchmark and Models for Generalized Moment Retrieval

Yiming Ding; Siyu Cao; Luyuan Jiao; Yixuan Li; Zitong Wang; Zhiyong Liu; Lu Zhang

arXiv:2605.02623·cs.CV·May 5, 2026

Retrieving Any Relevant Moments: Benchmark and Models for Generalized Moment Retrieval

Yiming Ding, Siyu Cao, Luyuan Jiao, Yixuan Li, Zitong Wang, Zhiyong Liu, Lu Zhang

PDF

1 Datasets

TL;DR

This paper introduces Generalized Moment Retrieval (GMR), a new benchmark and models for retrieving multiple relevant video segments or none, reflecting real-world complexities in video-language understanding.

Contribution

It formulates GMR as a unified task, creates Soccer-GMR benchmark, and develops baseline models, advancing the study of realistic video moment retrieval scenarios.

Findings

01

GMR models outperform traditional VMR in complex scenarios.

02

Soccer-GMR benchmark enables scalable, high-quality data generation.

03

Current methods have notable limitations revealed by extensive experiments.

Abstract

Video Moment Retrieval (VMR) aims to localize temporal segments in videos that correspond to a natural language query, but typically assumes only a single matching moment for each query. This assumption does not always hold in real-world scenarios, where queries may correspond to multiple or no moments. Thus, we formulate Generalized Moment Retrieval (GMR), a unified setting that requires retrieving the complete set of relevant moments or predicting an empty set. To enable systematic study of GMR, we introduce Soccer-GMR, a large-scale benchmark built on challenging soccer videos that reflect general GMR scenarios, with realistic negative and positive queries. The benchmark is constructed via a duration-flexible semi-automated pipeline with human verification, enabling scalable data generation while maintaining high annotation quality. We further design a unified evaluation protocol…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

diiiA22B9S/Soccer-GMR
dataset· 537 dl
537 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.