Learning Goal-Oriented Visual Dialog via Tempered Policy Gradient

Rui Zhao; Volker Tresp

arXiv:1807.00737·cs.LG·May 26, 2020

Learning Goal-Oriented Visual Dialog via Tempered Policy Gradient

Rui Zhao, Volker Tresp

PDF

1 Repo

TL;DR

This paper introduces Tempered Policy Gradients, a novel reinforcement learning method that enhances goal-oriented visual dialogue agents by improving policy quality and utterance quality, demonstrated on the GuessWhat?! game.

Contribution

The paper proposes Tempered Policy Gradients, a new temperature-based extension for policy gradient methods, improving dialogue policy learning and utterance quality in visual dialogue tasks.

Findings

01

7% improvement with Seq2Seq and Memory Network extension

02

Additional 5% performance boost with TPG methods

03

More convincing utterances produced by TPGs

Abstract

Learning goal-oriented dialogues by means of deep reinforcement learning has recently become a popular research topic. However, commonly used policy-based dialogue agents often end up focusing on simple utterances and suboptimal policies. To mitigate this problem, we propose a class of novel temperature-based extensions for policy gradient methods, which are referred to as Tempered Policy Gradients (TPGs). On a recent AI-testbed, i.e., the GuessWhat?! game, we achieve significant improvements with two innovations. The first one is an extension of the state-of-the-art solutions with Seq2Seq and Memory Network structures that leads to an improvement of 7%. The second one is the application of our newly developed TPG methods, which improves the performance additionally by around 5% and, even more importantly, helps produce more convincing utterances.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ruizhaogit/GuessWhat-TemperedPolicyGradient
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Sequence to Sequence