Variational Reward Estimator Bottleneck: Learning Robust Reward   Estimator for Multi-Domain Task-Oriented Dialog

Jeiyoon Park; Chanhee Lee; Kuekyeng Kim; Heuiseok Lim

arXiv:2006.00417·cs.AI·June 2, 2020

Variational Reward Estimator Bottleneck: Learning Robust Reward Estimator for Multi-Domain Task-Oriented Dialog

Jeiyoon Park, Chanhee Lee, Kuekyeng Kim, Heuiseok Lim

PDF

Open Access

TL;DR

This paper introduces the Variational Reward Estimator Bottleneck (VRB), a regularization technique that improves reward estimation in multi-domain dialog systems by focusing on discriminative features and balancing policy and reward training.

Contribution

The paper proposes VRB, a novel regularization method using information bottleneck to enhance reward estimator robustness in adversarial dialog systems.

Findings

01

VRB significantly outperforms previous methods on multi-domain dialog datasets.

02

VRB effectively balances policy and reward training in adversarial learning.

03

Empirical results demonstrate improved reward estimation quality.

Abstract

Despite its notable success in adversarial learning approaches to multi-domain task-oriented dialog system, training the dialog policy via adversarial inverse reinforcement learning often fails to balance the performance of the policy generator and reward estimator. During optimization, the reward estimator often overwhelms the policy generator and produces excessively uninformative gradients. We proposes the Variational Reward estimator Bottleneck (VRB), which is an effective regularization method that aims to constrain unproductive information flows between inputs and the reward estimator. The VRB focuses on capturing discriminative features, by exploiting information bottleneck on mutual information. Empirical results on a multi-domain task-oriented dialog dataset demonstrate that the VRB significantly outperforms previous methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Speech and dialogue systems