Improving Context Modelling in Multimodal Dialogue Generation

Shubham Agarwal; Ondrej Dusek; Ioannis Konstas; Verena Rieser

arXiv:1810.11955·cs.CL·November 22, 2018

Improving Context Modelling in Multimodal Dialogue Generation

Shubham Agarwal, Ondrej Dusek, Ioannis Konstas, Verena Rieser

PDF

1 Repo

TL;DR

This paper enhances multimodal dialogue systems by extending the HRED model to better incorporate visual and textual context, leading to improved response quality in fashion domain conversations.

Contribution

It introduces a multimodal extension to the HRED model and demonstrates its superiority over baselines in multimodal dialogue generation.

Findings

01

Improved text similarity metrics with the new model

02

Error analysis reveals current model limitations

03

Multimodal extension outperforms baselines

Abstract

In this work, we investigate the task of textual response generation in a multimodal task-oriented dialogue system. Our work is based on the recently released Multimodal Dialogue (MMD) dataset (Saha et al., 2017) in the fashion domain. We introduce a multimodal extension to the Hierarchical Recurrent Encoder-Decoder (HRED) model and show that this extension outperforms strong baselines in terms of text-based similarity metrics. We also showcase the shortcomings of current vision and language models by performing an error analysis on our system's output.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shubhamagarwal92/mmd
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.