End-to-end optimization of goal-driven and visually grounded dialogue   systems

Florian Strub; Harm de Vries; Jeremie Mary; Bilal Piot and; Aaron Courville; Olivier Pietquin

arXiv:1703.05423·cs.CL·March 17, 2017·32 cites

End-to-end optimization of goal-driven and visually grounded dialogue systems

Florian Strub, Harm de Vries, Jeremie Mary, Bilal Piot and, Aaron Courville, Olivier Pietquin

PDF

Open Access 2 Repos

TL;DR

This paper introduces a deep reinforcement learning approach for optimizing goal-driven, visually grounded dialogue systems, addressing planning and grounding challenges beyond traditional supervised methods.

Contribution

It presents a novel end-to-end reinforcement learning framework for visually grounded task-oriented dialogues, improving naturalness and object discovery in complex images.

Findings

01

Effective dialogue generation in complex visual environments

02

Improved object discovery accuracy

03

Encouraging results on a large dialogue dataset

Abstract

End-to-end design of dialogue systems has recently become a popular research topic thanks to powerful tools such as encoder-decoder architectures for sequence-to-sequence learning. Yet, most current approaches cast human-machine dialogue management as a supervised learning problem, aiming at predicting the next utterance of a participant given the full history of the dialogue. This vision is too simplistic to render the intrinsic planning problem inherent to dialogue as well as its grounded nature, making the context of a dialogue larger than the sole history. This is why only chit-chat and question answering tasks have been addressed so far using end-to-end architectures. In this paper, we introduce a Deep Reinforcement Learning method to optimize visually grounded task-oriented dialogues, based on the policy gradient algorithm. This approach is tested on a dataset of 120k dialogues…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Multimodal Machine Learning Applications · Topic Modeling