Interactive Video Corpus Moment Retrieval using Reinforcement Learning
Zhixin Ma, Chong-Wah Ngo

TL;DR
This paper introduces a reinforcement learning approach for interactive video corpus moment retrieval, enabling efficient localization of deep-hidden moments in large video datasets through user feedback.
Contribution
It proposes a novel reinforcement learning framework that plans navigation paths and recommends targets to improve video moment retrieval accuracy.
Findings
Effective retrieval of deep-hidden moments in large datasets
Outperforms state-of-the-art auto-search engines on TVR and DiDeMo datasets
Reduces user interaction rounds needed for accurate search
Abstract
Known-item video search is effective with human-in-the-loop to interactively investigate the search result and refine the initial query. Nevertheless, when the first few pages of results are swamped with visually similar items, or the search target is hidden deep in the ranked list, finding the know-item target usually requires a long duration of browsing and result inspection. This paper tackles the problem by reinforcement learning, aiming to reach a search target within a few rounds of interaction by long-term learning from user feedbacks. Specifically, the system interactively plans for navigation path based on feedback and recommends a potential target that maximizes the long-term reward for user comment. We conduct experiments for the challenging task of video corpus moment retrieval (VCMR) to localize moments from a large video corpus. The experimental results on TVR and DiDeMo…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
