Video Dialog as Conversation about Objects Living in Space-Time

Hoang-Anh Pham; Thao Minh Le; Vuong Le; Tu Minh Phuong; Truyen Tran

arXiv:2207.03656·cs.CV·July 11, 2022

Video Dialog as Conversation about Objects Living in Space-Time

Hoang-Anh Pham, Thao Minh Le, Vuong Le, Tu Minh Phuong, Truyen Tran

PDF

Open Access 1 Repo

TL;DR

This paper introduces COST, a novel object-centric framework for video dialog that enables high-level reasoning about space-time visual content, improving conversational understanding about videos.

Contribution

The paper presents COST, a new neural reasoning framework that tracks object trajectories and dialog states for enhanced video-based conversational AI.

Findings

01

COST achieves competitive results on DSTC7 and DSTC8 benchmarks.

02

Object trajectory parsing improves reasoning over video content.

03

Maintaining dialog and object states enhances answer relevance.

Abstract

It would be a technological feat to be able to create a system that can hold a meaningful conversation with humans about what they watch. A setup toward that goal is presented as a video dialog task, where the system is asked to generate natural utterances in response to a question in an ongoing dialog. The task poses great visual, linguistic, and reasoning challenges that cannot be easily overcome without an appropriate representation scheme over video and dialog that supports high-level reasoning. To tackle these challenges we present a new object-centric framework for video dialog that supports neural reasoning dubbed COST - which stands for Conversation about Objects in Space-Time. Here dynamic space-time visual content in videos is first parsed into object trajectories. Given this video abstraction, COST maintains and tracks object-associated dialog states, which are updated upon…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hoanganhpham1006/cost
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling