Distributed Structured Actor-Critic Reinforcement Learning for Universal   Dialogue Management

Zhi Chen; Lu Chen; Xiaoyuan Liu; and Kai Yu

arXiv:2009.10326·cs.CL·September 23, 2020

Distributed Structured Actor-Critic Reinforcement Learning for Universal Dialogue Management

Zhi Chen, Lu Chen, Xiaoyuan Liu, and Kai Yu

PDF

TL;DR

This paper proposes a distributed structured actor-critic reinforcement learning approach to improve dialogue management in task-oriented spoken dialogue systems, focusing on policy decision-making within a POMDP framework.

Contribution

It introduces a novel distributed structured actor-critic method tailored for dialogue policy optimization, advancing the application of DRL in dialogue systems.

Findings

01

Enhanced policy learning efficiency

02

Improved dialogue success rates

03

Better generalization in dialogue tasks

Abstract

The task-oriented spoken dialogue system (SDS) aims to assist a human user in accomplishing a specific task (e.g., hotel booking). The dialogue management is a core part of SDS. There are two main missions in dialogue management: dialogue belief state tracking (summarising conversation history) and dialogue decision-making (deciding how to reply to the user). In this work, we only focus on devising a policy that chooses which dialogue action to respond to the user. The sequential system decision-making process can be abstracted into a partially observable Markov decision process (POMDP). Under this framework, reinforcement learning approaches can be used for automated policy optimization. In the past few years, there are many deep reinforcement learning (DRL) algorithms, which use neural networks (NN) as function approximators, investigated for dialogue policy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.