End-to-End Offline Goal-Oriented Dialog Policy Learning via Policy   Gradient

Li Zhou; Kevin Small; Oleg Rokhlenko; Charles Elkan

arXiv:1712.02838·cs.AI·December 11, 2017·31 cites

End-to-End Offline Goal-Oriented Dialog Policy Learning via Policy Gradient

Li Zhou, Kevin Small, Oleg Rokhlenko, Charles Elkan

PDF

Open Access

TL;DR

This paper presents an offline reinforcement learning approach for goal-oriented dialog policy learning that optimizes at both utterance and dialog levels using a novel reward function and policy gradients, without online interaction.

Contribution

It introduces a novel offline RL method that leverages unannotated dialog data to improve goal-oriented dialog policies at multiple levels.

Findings

01

Effective offline policy optimization demonstrated

02

Improved dialog-level decision making

03

No need for online interaction or explicit state space

Abstract

Learning a goal-oriented dialog policy is generally performed offline with supervised learning algorithms or online with reinforcement learning (RL). Additionally, as companies accumulate massive quantities of dialog transcripts between customers and trained human agents, encoder-decoder methods have gained popularity as agent utterances can be directly treated as supervision without the need for utterance-level annotations. However, one potential drawback of such approaches is that they myopically generate the next agent utterance without regard for dialog-level considerations. To resolve this concern, this paper describes an offline RL method for learning from unannotated corpora that can optimize a goal-oriented policy at both the utterance and dialog level. We introduce a novel reward function and use both on-policy and off-policy policy gradient to learn a policy offline without…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · AI in Service Interactions