Reasoning Over History: Context Aware Visual Dialog

Muhammad A. Shah; Shikib Mehri; Tejas Srinivasan

arXiv:2011.00669·cs.CL·November 3, 2020

Reasoning Over History: Context Aware Visual Dialog

Muhammad A. Shah, Shikib Mehri, Tejas Srinivasan

PDF

Open Access

TL;DR

This paper introduces CAM, an extension to the MAC network, enabling it to perform reasoning over dialog history for multi-turn visual question answering, significantly improving coreference resolution and overall accuracy.

Contribution

The paper proposes CAM, a novel mechanism integrated into MAC networks, allowing effective reasoning over dialog history in visual dialog tasks.

Findings

01

Achieved up to 98.25% accuracy on CLEVR-Dialog dataset.

02

Improved performance on coreference resolution questions.

03

Outperformed previous state-of-the-art by 30% absolute accuracy.

Abstract

While neural models have been shown to exhibit strong performance on single-turn visual question answering (VQA) tasks, extending VQA to a multi-turn, conversational setting remains a challenge. One way to address this challenge is to augment existing strong neural VQA models with the mechanisms that allow them to retain information from previous dialog turns. One strong VQA model is the MAC network, which decomposes a task into a series of attention-based reasoning steps. However, since the MAC network is designed for single-turn question answering, it is not capable of referring to past dialog turns. More specifically, it struggles with tasks that require reasoning over the dialog history, particularly coreference resolution. We extend the MAC network architecture with Context-aware Attention and Memory (CAM), which attends over control states in past dialog turns to determine the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Speech and dialogue systems

MethodsClass-activation map