TL;DR
This paper introduces LAVA, a method that uses variational auto-encoding to create a meaningful latent action space for dialogue policy optimization, improving reinforcement learning efficiency and success rates in task-oriented dialogue systems.
Contribution
The paper proposes leveraging auxiliary response auto-encoding tasks to shape latent action spaces, enabling more effective end-to-end dialogue policy training with state-of-the-art results.
Findings
Latent action spaces improve RL training in dialogue systems.
Auxiliary auto-encoding enhances the interpretability of latent representations.
Achieves state-of-the-art success rates in dialogue policy optimization.
Abstract
Reinforcement learning (RL) can enable task-oriented dialogue systems to steer the conversation towards successful task completion. In an end-to-end setting, a response can be constructed in a word-level sequential decision making process with the entire system vocabulary as action space. Policies trained in such a fashion do not require expert-defined action spaces, but they have to deal with large action spaces and long trajectories, making RL impractical. Using the latent space of a variational model as action space alleviates this problem. However, current approaches use an uninformed prior for training and optimize the latent distribution solely on the context. It is therefore unclear whether the latent representation truly encodes the characteristics of different actions. In this paper, we explore three ways of leveraging an auxiliary task to shape the latent variable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
