On the Effectiveness of Integration Methods for Multimodal Dialogue Response Retrieval
Seongbo Jang, Seonghyeon Lee, Dongha Lee, Hwanjo Yu

TL;DR
This paper investigates methods for integrating multiple modalities in dialogue response retrieval, proposing and comparing three approaches, with experiments showing the end-to-end method's effectiveness and benefits of parameter sharing.
Contribution
It introduces a multimodal dialogue response retrieval task and compares three integration methods, highlighting the advantages of an end-to-end approach with parameter sharing.
Findings
End-to-end approach achieves comparable performance without intermediate steps.
Parameter sharing reduces parameters and improves performance.
Experimental results on two datasets validate the proposed methods.
Abstract
Multimodal chatbots have become one of the major topics for dialogue systems in both research community and industry. Recently, researchers have shed light on the multimodality of responses as well as dialogue contexts. This work explores how a dialogue system can output responses in various modalities such as text and image. To this end, we first formulate a multimodal dialogue response retrieval task for retrieval-based systems as the combination of three subtasks. We then propose three integration methods based on a two-step approach and an end-to-end approach, and compare the merits and demerits of each method. Experimental results on two datasets demonstrate that the end-to-end approach achieves comparable performance without an intermediate step in the two-step approach. In addition, a parameter sharing strategy not only reduces the number of parameters but also boosts performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
