Audio Visual Scene-Aware Dialog (AVSD) Challenge at DSTC7

Huda Alamri; Vincent Cartillier; Raphael Gontijo Lopes; Abhishek Das,; Jue Wang; Irfan Essa; Dhruv Batra; Devi Parikh; Anoop Cherian; Tim K. Marks,; Chiori Hori

arXiv:1806.00525·cs.CL·June 5, 2018·25 cites

Audio Visual Scene-Aware Dialog (AVSD) Challenge at DSTC7

Huda Alamri, Vincent Cartillier, Raphael Gontijo Lopes, Abhishek Das,, Jue Wang, Irfan Essa, Dhruv Batra, Devi Parikh, Anoop Cherian, Tim K. Marks,, Chiori Hori

PDF

Open Access 4 Repos

TL;DR

The paper introduces the AVSD challenge and dataset to advance scene-aware dialog systems capable of discussing objects and events in videos, integrating multiple AI research areas.

Contribution

It presents a new challenge and dataset for developing dialog systems that understand and describe video content in conversational settings.

Findings

01

First AVSD challenge dataset released

02

Baseline models established for AVSD task

03

Encourages multi-disciplinary research in video-based dialog systems

Abstract

Scene-aware dialog systems will be able to have conversations with users about the objects and events around them. Progress on such systems can be made by integrating state-of-the-art technologies from multiple research areas including end-to-end dialog systems visual dialog, and video description. We introduce the Audio Visual Scene Aware Dialog (AVSD) challenge and dataset. In this challenge, which is one track of the 7th Dialog System Technology Challenges (DSTC7) workshop1, the task is to build a system that generates responses in a dialog about an input video

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Advanced Image and Video Retrieval Techniques