Semi-Supervised Dialogue Policy Learning via Stochastic Reward   Estimation

Xinting Huang; Jianzhong Qi; Yu Sun; Rui Zhang

arXiv:2005.04379·cs.CL·May 12, 2020·1 cites

Semi-Supervised Dialogue Policy Learning via Stochastic Reward Estimation

Xinting Huang, Jianzhong Qi, Yu Sun, Rui Zhang

PDF

Open Access

TL;DR

This paper introduces a semi-supervised reward learning method for dialogue policy optimization that models dialogue progress with a dynamics model, reducing the need for extensive expert annotations and improving performance on MultiWOZ.

Contribution

It proposes a novel semi-supervised reward learning approach using a dynamics model and action embeddings, enabling effective dialogue policy training with less supervision.

Findings

01

Outperforms baseline methods on MultiWOZ dataset

02

Effective in learning from limited expert annotations

03

Improves dialogue policy generalization

Abstract

Dialogue policy optimization often obtains feedback until task completion in task-oriented dialogue systems. This is insufficient for training intermediate dialogue turns since supervision signals (or rewards) are only provided at the end of dialogues. To address this issue, reward learning has been introduced to learn from state-action pairs of an optimal policy to provide turn-by-turn rewards. This approach requires complete state-action annotations of human-to-human dialogues (i.e., expert demonstrations), which is labor intensive. To overcome this limitation, we propose a novel reward learning approach for semi-supervised policy learning. The proposed approach learns a dynamics model as the reward function which models dialogue progress (i.e., state-action sequences) based on expert demonstrations, either with or without annotations. The dynamics model computes rewards by predicting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Topic Modeling · Context-Aware Activity Recognition Systems