Multimodal Dialog Systems with Dual Knowledge-enhanced Generative   Pretrained Language Model

Xiaolin Chen; Xuemeng Song; Liqiang Jing; Shuo Li; Linmei Hu; and; Liqiang Nie

arXiv:2207.07934·cs.CL·May 14, 2024·5 cites

Multimodal Dialog Systems with Dual Knowledge-enhanced Generative Pretrained Language Model

Xiaolin Chen, Xuemeng Song, Liqiang Jing, Shuo Li, Linmei Hu, and, Liqiang Nie

PDF

Open Access

TL;DR

This paper introduces DKMD, a dual knowledge-enhanced generative pretrained model for multimodal dialog systems, which improves response quality by integrating textual and visual knowledge during response generation.

Contribution

It proposes a novel dual knowledge selection and integration framework with a modified BART decoder for enhanced multimodal dialog response generation.

Findings

01

Outperforms state-of-the-art models on public datasets.

02

Effectively integrates multimodal knowledge for more accurate responses.

03

Demonstrates significant improvements in response relevance and coherence.

Abstract

Text response generation for multimodal task-oriented dialog systems, which aims to generate the proper text response given the multimodal context, is an essential yet challenging task. Although existing efforts have achieved compelling success, they still suffer from two pivotal limitations: 1) overlook the benefit of generative pre-training, and 2) ignore the textual context related knowledge. To address these limitations, we propose a novel dual knowledge-enhanced generative pretrained language model for multimodal task-oriented dialog systems (DKMD), consisting of three key components: dual knowledge selection, dual knowledge-enhanced context learning, and knowledge-enhanced response generation. To be specific, the dual knowledge selection component aims to select the related knowledge according to both textual and visual modalities of the given context. Thereafter, the dual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Speech and dialogue systems

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Softmax · Byte Pair Encoding · Refunds@Expedia|||How do I get a full refund from Expedia? · Adam · Residual Connection · Layer Normalization