Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management
Dhawal Gupta, Yinlam Chow, Aza Tulepbergenov, Mohammad Ghavamzadeh,, Craig Boutilier

TL;DR
This paper introduces RL algorithms tailored for dialogue management that leverage Mixture-of-Expert Language Models to reduce action space complexity and enhance multi-turn conversational effectiveness.
Contribution
It develops novel RL methods that utilize MoE-LMs to improve dialogue management by addressing large action spaces and increasing response diversity.
Findings
Enhanced dialogue diversity and intent coverage
Improved RL-based dialogue management performance
Effective handling of large action spaces
Abstract
Reinforcement learning (RL) has shown great promise for developing dialogue management (DM) agents that are non-myopic, conduct rich conversations, and maximize overall user satisfaction. Despite recent developments in RL and language models (LMs), using RL to power conversational chatbots remains challenging, in part because RL requires online exploration to learn effectively, whereas collecting novel human-bot interactions can be expensive and unsafe. This issue is exacerbated by the combinatorial action spaces facing these algorithms, as most LM agents generate responses at the word level. We develop a variety of RL algorithms, specialized to dialogue planning, that leverage recent Mixture-of-Expert Language Models (MoE-LMs) -- models that capture diverse semantics, generate utterances reflecting different intents, and are amenable for multi-turn DM. By exploiting MoE-LM structure,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques
