Probabilistic framework for solving Visual Dialog

Badri N. Patro; Anupriy; Vinay P. Namboodiri

arXiv:1909.04800·cs.CV·October 18, 2019

Probabilistic framework for solving Visual Dialog

Badri N. Patro, Anupriy, Vinay P. Namboodiri

PDF

TL;DR

This paper introduces a probabilistic framework for Visual Dialog that estimates uncertainty, enhances answer diversity, and improves system explainability by integrating probabilistic representations and uncertainty minimization.

Contribution

It presents a novel probabilistic approach that models uncertainty in Visual Dialog, enabling more diverse and explainable answers compared to existing deep learning methods.

Findings

01

Improved accuracy over state-of-the-art models

02

Enhanced answer diversity and explainability

03

Effective uncertainty estimation and visualization

Abstract

In this paper, we propose a probabilistic framework for solving the task of `Visual Dialog'. Solving this task requires reasoning and understanding of visual modality, language modality, and common sense knowledge to answer. Various architectures have been proposed to solve this task by variants of multi-modal deep learning techniques that combine visual and language representations. However, we believe that it is crucial to understand and analyze the sources of uncertainty for solving this task. Our approach allows for estimating uncertainty and also aids a diverse generation of answers. The proposed approach is obtained through a probabilistic representation module that provides us with representations for image, question and conversation history, a module that ensures that diverse latent representations for candidate answers are obtained given the probabilistic representations and an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.