End-to-end optimization of goal-driven and visually grounded dialogue systems
Florian Strub, Harm de Vries, Jeremie Mary, Bilal Piot and, Aaron Courville, Olivier Pietquin

TL;DR
This paper introduces a deep reinforcement learning approach for optimizing goal-driven, visually grounded dialogue systems, addressing planning and grounding challenges beyond traditional supervised methods.
Contribution
It presents a novel end-to-end reinforcement learning framework for visually grounded task-oriented dialogues, improving naturalness and object discovery in complex images.
Findings
Effective dialogue generation in complex visual environments
Improved object discovery accuracy
Encouraging results on a large dialogue dataset
Abstract
End-to-end design of dialogue systems has recently become a popular research topic thanks to powerful tools such as encoder-decoder architectures for sequence-to-sequence learning. Yet, most current approaches cast human-machine dialogue management as a supervised learning problem, aiming at predicting the next utterance of a participant given the full history of the dialogue. This vision is too simplistic to render the intrinsic planning problem inherent to dialogue as well as its grounded nature, making the context of a dialogue larger than the sole history. This is why only chit-chat and question answering tasks have been addressed so far using end-to-end architectures. In this paper, we introduce a Deep Reinforcement Learning method to optimize visually grounded task-oriented dialogues, based on the policy gradient algorithm. This approach is tested on a dataset of 120k dialogues…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Multimodal Machine Learning Applications · Topic Modeling
