Reasoning Visual Dialogs with Structural and Partial Observations
Zilong Zheng, Wenguan Wang, Siyuan Qi, Song-Chun Zhu

TL;DR
This paper introduces a novel graph-based model for Visual Dialog that infers dialog structures and answers simultaneously, improving reasoning capabilities and outperforming existing methods on standard datasets.
Contribution
It formalizes Visual Dialog as inference in a graphical model with partial observations and proposes a differentiable GNN solution to infer dialog structures and answers.
Findings
Outperforms comparative methods on VisDial datasets
Successfully infers underlying dialog structures
Enhances dialog reasoning accuracy
Abstract
We propose a novel model to address the task of Visual Dialog which exhibits complex dialog structures. To obtain a reasonable answer based on the current question and the dialog history, the underlying semantic dependencies between dialog entities are essential. In this paper, we explicitly formalize this task as inference in a graphical model with partially observed nodes and unknown graph structures (relations in dialog). The given dialog entities are viewed as the observed nodes. The answer to a given question is represented by a node with missing value. We first introduce an Expectation Maximization algorithm to infer both the underlying dialog structures and the missing node values (desired answers). Based on this, we proceed to propose a differentiable graph neural network (GNN) solution that approximates this process. Experiment results on the VisDial and VisDial-Q datasets show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Advanced Image and Video Retrieval Techniques
MethodsGraph Neural Network
