Multimodal Dialogue Response Generation
Qingfeng Sun, Yujing Wang, Can Xu, Kai Zheng, Yaming Yang, Huang Hu,, Fei Xu, Jessica Zhang, Xiubo Geng, Daxin Jiang

TL;DR
This paper introduces Divter, a novel multimodal dialogue generation model that effectively produces both textual and image responses using limited training data, outperforming existing methods in quality and informativeness.
Contribution
The paper proposes a new low-resource multimodal dialogue generation approach that isolates multimodal parameters, enabling training with limited data and achieving state-of-the-art results.
Findings
Achieves state-of-the-art performance in automatic and human evaluations.
Generates informative text and high-resolution images as responses.
Effectively learns from limited multimodal dialogue data.
Abstract
Responsing with image has been recognized as an important capability for an intelligent conversational agent. Yet existing works only focus on exploring the multimodal dialogue models which depend on retrieval-based methods, but neglecting generation methods. To fill in the gaps, we first present a multimodal dialogue generation model, which takes the dialogue history as input, then generates a textual sequence or an image as response. Learning such a model often requires multimodal dialogues containing both texts and images which are difficult to obtain. Motivated by the challenge in practice, we consider multimodal dialogue generation under a natural assumption that only limited training examples are available. In such a low-resource setting, we devise a novel conversational agent, Divter, in order to isolate parameters that depend on multimodal dialogues from the entire generation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
