Describing Unseen Videos via Multi-Modal Cooperative Dialog Agents

Ye Zhu; Yu Wu; Yi Yang; and Yan Yan

arXiv:2008.07935·cs.CV·August 25, 2020

Describing Unseen Videos via Multi-Modal Cooperative Dialog Agents

Ye Zhu, Yu Wu, Yi Yang, and Yan Yan

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel multi-modal cooperative dialog task where one agent describes an unseen video based on limited static frames and dialog, with a focus on knowledge transfer and improved video description.

Contribution

It proposes a new task and a QA-Cooperative Network with dynamic dialog update, enabling one agent to effectively describe unseen videos through cooperative learning.

Findings

01

Q-BOT effectively learns to describe unseen videos.

02

The model achieves promising performance with full dialog history.

03

Cooperative learning improves video description accuracy.

Abstract

With the arising concerns for the AI systems provided with direct access to abundant sensitive information, researchers seek to develop more reliable AI with implicit information sources. To this end, in this paper, we introduce a new task called video description via two multi-modal cooperative dialog agents, whose ultimate goal is for one conversational agent to describe an unseen video based on the dialog and two static frames. Specifically, one of the intelligent agents - Q-BOT - is given two static frames from the beginning and the end of the video, as well as a finite number of opportunities to ask relevant natural language questions before describing the unseen video. A-BOT, the other agent who has already seen the entire video, assists Q-BOT to accomplish the goal by providing answers to those questions. We propose a QA-Cooperative Network with a dynamic dialog history update…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

L-YeZhu/Video-Description-via-Dialog-Agents-ECCV2020
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning