Interactive Video Corpus Moment Retrieval using Reinforcement Learning

Zhixin Ma; Chong-Wah Ngo

arXiv:2302.09522·cs.CV·February 21, 2023

Interactive Video Corpus Moment Retrieval using Reinforcement Learning

Zhixin Ma, Chong-Wah Ngo

PDF

TL;DR

This paper introduces a reinforcement learning approach for interactive video corpus moment retrieval, enabling efficient localization of deep-hidden moments in large video datasets through user feedback.

Contribution

It proposes a novel reinforcement learning framework that plans navigation paths and recommends targets to improve video moment retrieval accuracy.

Findings

01

Effective retrieval of deep-hidden moments in large datasets

02

Outperforms state-of-the-art auto-search engines on TVR and DiDeMo datasets

03

Reduces user interaction rounds needed for accurate search

Abstract

Known-item video search is effective with human-in-the-loop to interactively investigate the search result and refine the initial query. Nevertheless, when the first few pages of results are swamped with visually similar items, or the search target is hidden deep in the ranked list, finding the know-item target usually requires a long duration of browsing and result inspection. This paper tackles the problem by reinforcement learning, aiming to reach a search target within a few rounds of interaction by long-term learning from user feedbacks. Specifically, the system interactively plans for navigation path based on feedback and recommends a potential target that maximizes the long-term reward for user comment. We conduct experiments for the challenging task of video corpus moment retrieval (VCMR) to localize moments from a large video corpus. The experimental results on TVR and DiDeMo…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.