DAM: Deliberation, Abandon and Memory Networks for Generating Detailed and Non-repetitive Responses in Visual Dialogue
Xiaoze Jiang, Jing Yu, Yajing Sun, Zengchang Qin, Zihao Zhu, Yue Hu,, Qi Wu

TL;DR
This paper introduces DAM, a novel network architecture for visual dialogue that generates detailed, non-repetitive responses by decomposing word generation into attention-based steps, improving response quality and flexibility.
Contribution
The paper proposes the DAM module, a flexible, attention-based architecture that enhances response detail and diversity in visual dialogue systems, compatible with various encoders.
Findings
Achieves state-of-the-art performance on VisDial v1.0 dataset.
Produces more detailed and non-repetitive responses.
Demonstrates flexibility with different encoder structures.
Abstract
Visual Dialogue task requires an agent to be engaged in a conversation with human about an image. The ability of generating detailed and non-repetitive responses is crucial for the agent to achieve human-like conversation. In this paper, we propose a novel generative decoding architecture to generate high-quality responses, which moves away from decoding the whole encoded semantics towards the design that advocates both transparency and flexibility. In this architecture, word generation is decomposed into a series of attention-based information selection steps, performed by the novel recurrent Deliberation, Abandon and Memory (DAM) module. Each DAM module performs an adaptive combination of the response-level semantics captured from the encoder and the word-level semantics specifically selected for generating each word. Therefore, the responses contain more detailed and non-repetitive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Video Analysis and Summarization
