Open Domain Dialogue Generation with Latent Images
Ze Yang, Wei Wu, Huang Hu, Can Xu, Wei Wang, Zhoujun Li

TL;DR
This paper introduces a novel method for open domain dialogue generation that leverages latent images to ground responses, enabling effective augmentation and content enrichment in both image-grounded and text-based conversations.
Contribution
It proposes a conditional variational auto-encoding framework to recover latent images from textual dialogues and integrate them into response generation.
Findings
Latent images improve response relevance in text-based conversations.
Textual dialogues can augment image-grounded dialogues, especially in low-resource settings.
Latent images enrich response content while maintaining contextual relevance.
Abstract
We consider grounding open domain dialogues with images. Existing work assumes that both an image and a textual context are available, but image-grounded dialogues by nature are more difficult to obtain than textual dialogues. Thus, we propose learning a response generation model with both image-grounded dialogues and textual dialogues by assuming that the visual scene information at the time of a conversation can be represented by an image, and trying to recover the latent images of the textual dialogues through text-to-image generation techniques. The likelihood of the two types of dialogues is then formulated by a response generator and an image reconstructor that are learned within a conditional variational auto-encoding framework. Empirical studies are conducted in both image-grounded conversation and text-based conversation. In the first scenario, image-grounded dialogues,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
