Scene-Aware Prompt for Multi-modal Dialogue Understanding and Generation
Bin Li, Yixuan Weng, Ziyu Ma, Bin Sun, Shutao Li

TL;DR
This paper presents a scene-aware prompt approach that leverages visual captions and multi-task learning to enhance multi-modal dialogue understanding and generation, achieving state-of-the-art results in a shared task competition.
Contribution
The paper introduces a novel scene-aware prompt method that jointly models scene and session understanding for improved multi-modal dialogue generation.
Findings
Achieved state-of-the-art performance in MDUG subtasks
Ranked 1st in all three subtasks of the competition
Demonstrated effectiveness of scene-aware prompts in multi-modal dialogue tasks
Abstract
This paper introduces the schemes of Team LingJing's experiments in NLPCC-2022-Shared-Task-4 Multi-modal Dialogue Understanding and Generation (MDUG). The MDUG task can be divided into two phases: multi-modal context understanding and response generation. To fully leverage the visual information for both scene understanding and dialogue generation, we propose the scene-aware prompt for the MDUG task. Specifically, we utilize the multi-tasking strategy for jointly modelling the scene- and session- multi-modal understanding. The visual captions are adopted to aware the scene information, while the fixed-type templated prompt based on the scene- and session-aware labels are used to further improve the dialogue generation performance. Extensive experimental results show that the proposed method has achieved state-of-the-art (SOTA) performance compared with other competitive methods, where…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Speech and dialogue systems · Topic Modeling
MethodsAttentive Walk-Aggregating Graph Neural Network
