Integrating Pretrained Language Model for Dialogue Policy Learning
Hongru Wang, Huimin Wang, Zezhong Wang, Kam-Fai Wong

TL;DR
This paper proposes a novel method that integrates a pretrained language model as a discriminator in reinforcement learning to improve dialogue policy learning, addressing sparse rewards and enhancing dialogue success rates.
Contribution
It introduces a two-step adversarial training approach using a pretrained language model to provide dense rewards, improving dialogue policy learning efficiency.
Findings
Significantly improves dialogue success rate (~8%)
Increases dialogue completion rate (~4.4%)
Enhances exploration in reinforcement learning for dialogue systems
Abstract
Reinforcement Learning (RL) has been witnessed its potential for training a dialogue policy agent towards maximizing the accumulated rewards given from users. However, the reward can be very sparse for it is usually only provided at the end of a dialog session, which causes unaffordable interaction requirements for an acceptable dialog agent. Distinguished from many efforts dedicated to optimizing the policy and recovering the reward alternatively which suffers from easily getting stuck in local optima and model collapse, we decompose the adversarial training into two steps: 1) we integrate a pre-trained language model as a discriminator to judge whether the current system action is good enough for the last user action (i.e., \textit{next action prediction}); 2) the discriminator gives and extra local dense reward to guide the agent's exploration. The experimental result demonstrates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques
