Multimodal Dialog Systems with Dual Knowledge-enhanced Generative Pretrained Language Model
Xiaolin Chen, Xuemeng Song, Liqiang Jing, Shuo Li, Linmei Hu, and, Liqiang Nie

TL;DR
This paper introduces DKMD, a dual knowledge-enhanced generative pretrained model for multimodal dialog systems, which improves response quality by integrating textual and visual knowledge during response generation.
Contribution
It proposes a novel dual knowledge selection and integration framework with a modified BART decoder for enhanced multimodal dialog response generation.
Findings
Outperforms state-of-the-art models on public datasets.
Effectively integrates multimodal knowledge for more accurate responses.
Demonstrates significant improvements in response relevance and coherence.
Abstract
Text response generation for multimodal task-oriented dialog systems, which aims to generate the proper text response given the multimodal context, is an essential yet challenging task. Although existing efforts have achieved compelling success, they still suffer from two pivotal limitations: 1) overlook the benefit of generative pre-training, and 2) ignore the textual context related knowledge. To address these limitations, we propose a novel dual knowledge-enhanced generative pretrained language model for multimodal task-oriented dialog systems (DKMD), consisting of three key components: dual knowledge selection, dual knowledge-enhanced context learning, and knowledge-enhanced response generation. To be specific, the dual knowledge selection component aims to select the related knowledge according to both textual and visual modalities of the given context. Thereafter, the dual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Speech and dialogue systems
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Softmax · Byte Pair Encoding · Refunds@Expedia|||How do I get a full refund from Expedia? · Adam · Residual Connection · Layer Normalization
