Probabilistic framework for solving Visual Dialog
Badri N. Patro, Anupriy, Vinay P. Namboodiri

TL;DR
This paper introduces a probabilistic framework for Visual Dialog that estimates uncertainty, enhances answer diversity, and improves system explainability by integrating probabilistic representations and uncertainty minimization.
Contribution
It presents a novel probabilistic approach that models uncertainty in Visual Dialog, enabling more diverse and explainable answers compared to existing deep learning methods.
Findings
Improved accuracy over state-of-the-art models
Enhanced answer diversity and explainability
Effective uncertainty estimation and visualization
Abstract
In this paper, we propose a probabilistic framework for solving the task of `Visual Dialog'. Solving this task requires reasoning and understanding of visual modality, language modality, and common sense knowledge to answer. Various architectures have been proposed to solve this task by variants of multi-modal deep learning techniques that combine visual and language representations. However, we believe that it is crucial to understand and analyze the sources of uncertainty for solving this task. Our approach allows for estimating uncertainty and also aids a diverse generation of answers. The proposed approach is obtained through a probabilistic representation module that provides us with representations for image, question and conversation history, a module that ensures that diverse latent representations for candidate answers are obtained given the probabilistic representations and an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
