Variational Reward Estimator Bottleneck: Learning Robust Reward Estimator for Multi-Domain Task-Oriented Dialog
Jeiyoon Park, Chanhee Lee, Kuekyeng Kim, Heuiseok Lim

TL;DR
This paper introduces the Variational Reward Estimator Bottleneck (VRB), a regularization technique that improves reward estimation in multi-domain dialog systems by focusing on discriminative features and balancing policy and reward training.
Contribution
The paper proposes VRB, a novel regularization method using information bottleneck to enhance reward estimator robustness in adversarial dialog systems.
Findings
VRB significantly outperforms previous methods on multi-domain dialog datasets.
VRB effectively balances policy and reward training in adversarial learning.
Empirical results demonstrate improved reward estimation quality.
Abstract
Despite its notable success in adversarial learning approaches to multi-domain task-oriented dialog system, training the dialog policy via adversarial inverse reinforcement learning often fails to balance the performance of the policy generator and reward estimator. During optimization, the reward estimator often overwhelms the policy generator and produces excessively uninformative gradients. We proposes the Variational Reward estimator Bottleneck (VRB), which is an effective regularization method that aims to constrain unproductive information flows between inputs and the reward estimator. The VRB focuses on capturing discriminative features, by exploiting information bottleneck on mutual information. Empirical results on a multi-domain task-oriented dialog dataset demonstrate that the VRB significantly outperforms previous methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Speech and dialogue systems
