Reasoning Visual Dialog with Sparse Graph Learning and Knowledge   Transfer

Gi-Cheon Kang; Junseok Park; Hwaran Lee; Byoung-Tak Zhang; Jin-Hwa Kim

arXiv:2004.06698·cs.CV·September 1, 2021·1 cites

Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer

Gi-Cheon Kang, Junseok Park, Hwaran Lee, Byoung-Tak Zhang, Jin-Hwa Kim

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel Sparse Graph Learning approach combined with Knowledge Transfer to enhance reasoning and answer diversity in visual dialog systems, significantly outperforming existing methods on the VisDial v1.0 dataset.

Contribution

It proposes a new Sparse Graph Learning framework for dialog reasoning and a Knowledge Transfer technique to improve answer diversity, addressing key challenges in visual dialog understanding.

Findings

01

Outperforms state-of-the-art on VisDial v1.0 dataset

02

Enhances reasoning capabilities in visual dialog models

03

Increases diversity of generated answers

Abstract

Visual dialog is a task of answering a sequence of questions grounded in an image using the previous dialog history as context. In this paper, we study how to address two fundamental challenges for this task: (1) reasoning over underlying semantic structures among dialog rounds and (2) identifying several appropriate answers to the given question. To address these challenges, we propose a Sparse Graph Learning (SGL) method to formulate visual dialog as a graph structure learning task. SGL infers inherently sparse dialog structures by incorporating binary and score edges and leveraging a new structural loss function. Next, we introduce a Knowledge Transfer (KT) method that extracts the answer predictions from the teacher model and uses them as pseudo labels. We propose KT to remedy the shortcomings of single ground-truth labels, which severely limit the ability of a model to obtain…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gicheonkang/sglkt-visdial
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Video Analysis and Summarization

MethodsInterpretability · Softmax