Modelling Hierarchical Structure between Dialogue Policy and Natural Language Generator with Option Framework for Task-oriented Dialogue System
Jianhong Wang, Yuan Zhang, Tae-Kyun Kim, Yunjie Gu

TL;DR
This paper introduces HDNO, a hierarchical reinforcement learning framework that models dialogue policy and natural language generation with an option framework, improving task-oriented dialogue systems' performance and comprehensibility.
Contribution
The paper proposes a novel hierarchical reinforcement learning approach with an option framework for dialogue systems, incorporating a discriminator reward for better comprehensibility.
Findings
HDNO outperforms baseline models on MultiWoz datasets.
Hierarchical training guarantees convergence to a local maximum.
Latent dialogue acts enhance explainability.
Abstract
Designing task-oriented dialogue systems is a challenging research topic, since it needs not only to generate utterances fulfilling user requests but also to guarantee the comprehensibility. Many previous works trained end-to-end (E2E) models with supervised learning (SL), however, the bias in annotated system utterances remains as a bottleneck. Reinforcement learning (RL) deals with the problem through using non-differentiable evaluation metrics (e.g., the success rate) as rewards. Nonetheless, existing works with RL showed that the comprehensibility of generated system utterances could be corrupted when improving the performance on fulfilling user requests. In our work, we (1) propose modelling the hierarchical structure between dialogue policy and natural language generator (NLG) with the option framework, called HDNO, where the latent dialogue act is applied to avoid designing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · AI in Service Interactions
