Rethinking Action Spaces for Reinforcement Learning in End-to-end Dialog Agents with Latent Variable Models
Tiancheng Zhao, Kaige Xie, Maxine Eskenazi

TL;DR
This paper introduces a novel latent action framework for end-to-end dialog agents, enabling unsupervised learning of action spaces that improve reinforcement learning performance over traditional methods.
Contribution
It proposes a new latent variable approach to define action spaces, moving beyond handcrafted dialog acts and output vocabularies, with comprehensive experiments demonstrating its effectiveness.
Findings
Latent actions outperform word-level policy gradient methods.
Both continuous and discrete action types are effective.
Unsupervised induction of action spaces is feasible and beneficial.
Abstract
Defining action spaces for conversational agents and optimizing their decision-making process with reinforcement learning is an enduring challenge. Common practice has been to use handcrafted dialog acts, or the output vocabulary, e.g. in neural encoder decoders, as the action spaces. Both have their own limitations. This paper proposes a novel latent action framework that treats the action spaces of an end-to-end dialog agent as latent variables and develops unsupervised methods in order to induce its own action space from the data. Comprehensive experiments are conducted examining both continuous and discrete action types and two different optimization methods based on stochastic variational inference. Results show that the proposed latent actions achieve superior empirical performance improvement over previous word-level policy gradient methods on both DealOrNoDeal and MultiWoz…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Multimodal Machine Learning Applications
