DAM: Deliberation, Abandon and Memory Networks for Generating Detailed   and Non-repetitive Responses in Visual Dialogue

Xiaoze Jiang; Jing Yu; Yajing Sun; Zengchang Qin; Zihao Zhu; Yue Hu,; Qi Wu

arXiv:2007.03310·cs.CV·July 8, 2020

DAM: Deliberation, Abandon and Memory Networks for Generating Detailed and Non-repetitive Responses in Visual Dialogue

Xiaoze Jiang, Jing Yu, Yajing Sun, Zengchang Qin, Zihao Zhu, Yue Hu,, Qi Wu

PDF

Open Access 4 Repos

TL;DR

This paper introduces DAM, a novel network architecture for visual dialogue that generates detailed, non-repetitive responses by decomposing word generation into attention-based steps, improving response quality and flexibility.

Contribution

The paper proposes the DAM module, a flexible, attention-based architecture that enhances response detail and diversity in visual dialogue systems, compatible with various encoders.

Findings

01

Achieves state-of-the-art performance on VisDial v1.0 dataset.

02

Produces more detailed and non-repetitive responses.

03

Demonstrates flexibility with different encoder structures.

Abstract

Visual Dialogue task requires an agent to be engaged in a conversation with human about an image. The ability of generating detailed and non-repetitive responses is crucial for the agent to achieve human-like conversation. In this paper, we propose a novel generative decoding architecture to generate high-quality responses, which moves away from decoding the whole encoded semantics towards the design that advocates both transparency and flexibility. In this architecture, word generation is decomposed into a series of attention-based information selection steps, performed by the novel recurrent Deliberation, Abandon and Memory (DAM) module. Each DAM module performs an adaptive combination of the response-level semantics captured from the encoder and the word-level semantics specifically selected for generating each word. Therefore, the responses contain more detailed and non-repetitive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Video Analysis and Summarization