DMRM: A Dual-channel Multi-hop Reasoning Model for Visual Dialog

Feilong Chen; Fandong Meng; Jiaming Xu; Peng Li; Bo Xu; Jie Zhou

arXiv:1912.08360·cs.CL·December 19, 2019·1 cites

DMRM: A Dual-channel Multi-hop Reasoning Model for Visual Dialog

Feilong Chen, Fandong Meng, Jiaming Xu, Peng Li, Bo Xu, Jie Zhou

PDF

Open Access 1 Repo

TL;DR

The paper introduces DMRM, a dual-channel multi-hop reasoning model that improves visual dialog understanding by simultaneously reasoning over dialog history and images, leading to better response accuracy.

Contribution

It proposes a novel dual-channel multi-hop reasoning framework that captures richer multimodal information for visual dialog tasks, outperforming previous models.

Findings

01

Outperforms existing models on VisDial v0.9 and v1.0 datasets.

02

Effectively captures dialog history and image information simultaneously.

03

Enhances response accuracy through multimodal attention.

Abstract

Visual Dialog is a vision-language task that requires an AI agent to engage in a conversation with humans grounded in an image. It remains a challenging task since it requires the agent to fully understand a given question before making an appropriate response not only from the textual dialog history, but also from the visually-grounded information. While previous models typically leverage single-hop reasoning or single-channel reasoning to deal with this complex multimodal reasoning task, which is intuitively insufficient. In this paper, we thus propose a novel and more powerful Dual-channel Multi-hop Reasoning Model for Visual Dialog, named DMRM. DMRM synchronously captures information from the dialog history and the image to enrich the semantic representation of the question by exploiting dual-channel reasoning. Specifically, DMRM maintains a dual channel to obtain the question- and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

phellonchen/DMRM
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning