Deep RL with Hierarchical Action Exploration for Dialogue Generation
Itsugun Cho, Ryota Takahashi, Yusaku Yanase, Hiroaki Saito

TL;DR
This paper introduces a hierarchical action exploration method in deep reinforcement learning for dialogue generation, improving efficiency and response quality by using a dual-granularity Q-function and offline RL.
Contribution
It proposes a novel hierarchical exploration strategy with a dual-granularity Q-function and applies offline RL with multiple reward functions for better dialogue responses.
Findings
Outperforms baseline models on automatic metrics
Generates responses with higher expected rewards
Demonstrates improved explainability and controllability
Abstract
Traditionally, approximate dynamic programming is employed in dialogue generation with greedy policy improvement through action sampling, as the natural language action space is vast. However, this practice is inefficient for reinforcement learning (RL) due to the sparsity of eligible responses with high action values, which leads to weak improvement sustained by random sampling. This paper presents theoretical analysis and experiments that reveal the performance of the dialogue policy is positively correlated with the sampling size. To overcome this limitation, we introduce a novel dual-granularity Q-function that explores the most promising response category to intervene in the sampling process. Our approach extracts actions based on a grained hierarchy, thereby achieving the optimum with fewer policy iterations. Additionally, we use offline RL and learn from multiple reward functions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Reinforcement Learning in Robotics · AI in Service Interactions
