TL;DR
This paper introduces Tempered Policy Gradients, a novel reinforcement learning method that enhances goal-oriented visual dialogue agents by improving policy quality and utterance quality, demonstrated on the GuessWhat?! game.
Contribution
The paper proposes Tempered Policy Gradients, a new temperature-based extension for policy gradient methods, improving dialogue policy learning and utterance quality in visual dialogue tasks.
Findings
7% improvement with Seq2Seq and Memory Network extension
Additional 5% performance boost with TPG methods
More convincing utterances produced by TPGs
Abstract
Learning goal-oriented dialogues by means of deep reinforcement learning has recently become a popular research topic. However, commonly used policy-based dialogue agents often end up focusing on simple utterances and suboptimal policies. To mitigate this problem, we propose a class of novel temperature-based extensions for policy gradient methods, which are referred to as Tempered Policy Gradients (TPGs). On a recent AI-testbed, i.e., the GuessWhat?! game, we achieve significant improvements with two innovations. The first one is an extension of the state-of-the-art solutions with Seq2Seq and Memory Network structures that leads to an improvement of 7%. The second one is the application of our newly developed TPG methods, which improves the performance additionally by around 5% and, even more importantly, helps produce more convincing utterances.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Sequence to Sequence
