Learning to Retrieve Videos by Asking Questions

Avinash Madasu; Junier Oliva; Gedas Bertasius

arXiv:2205.05739·cs.CV·July 19, 2022

Learning to Retrieve Videos by Asking Questions

Avinash Madasu, Junier Oliva, Gedas Bertasius

PDF

1 Repo

TL;DR

This paper introduces ViReD, a novel interactive framework for video retrieval that uses dialog-based user feedback and a question generator guided by information theory to improve retrieval accuracy.

Contribution

It presents a multimodal question generator with information-guided supervision that enhances video retrieval through interactive dialog, outperforming traditional static systems.

Findings

01

Interactive dialog improves retrieval accuracy.

02

The question generator effectively incorporates visual and linguistic cues.

03

The approach generalizes to real-world human interactions.

Abstract

The majority of traditional text-to-video retrieval systems operate in static environments, i.e., there is no interaction between the user and the agent beyond the initial textual query provided by the user. This can be sub-optimal if the initial query has ambiguities, which would lead to many falsely retrieved videos. To overcome this limitation, we propose a novel framework for Video Retrieval using Dialog (ViReD), which enables the user to interact with an AI agent via multiple rounds of dialog, where the user refines retrieved results by answering questions generated by an AI agent. Our novel multimodal question generator learns to ask questions that maximize the subsequent video retrieval performance using (i) the video candidates retrieved during the last round of interaction with the user and (ii) the text-based dialog history documenting all previous interactions, to generate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

avinashsai/ViRED
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.