Simulation-Free Hierarchical Latent Policy Planning for Proactive   Dialogues

Tao He; Lizi Liao; Yixin Cao; Yuanxing Liu; Yiheng Sun; Zerui Chen,; Ming Liu; Bing Qin

arXiv:2412.14584·cs.CL·December 20, 2024

Simulation-Free Hierarchical Latent Policy Planning for Proactive Dialogues

Tao He, Lizi Liao, Yixin Cao, Yuanxing Liu, Yiheng Sun, Zerui Chen,, Ming Liu, Bing Qin

PDF

Open Access 1 Video

TL;DR

This paper introduces LDPP, a novel framework for proactive dialogue policy planning that automatically discovers and learns policies from real dialogue data, outperforming existing methods and large language models.

Contribution

The paper presents a fully automated, data-driven approach for discovering and learning dialogue policies using latent space representations and hierarchical reinforcement learning.

Findings

01

LDPP outperforms existing methods on proactive dialogue scenarios.

02

LDPP surpasses ChatGPT with a smaller 1.8-billion-parameter model.

03

The approach effectively automates policy discovery from real dialogue records.

Abstract

Recent advancements in proactive dialogues have garnered significant attention, particularly for more complex objectives (e.g. emotion support and persuasion). Unlike traditional task-oriented dialogues, proactive dialogues demand advanced policy planning and adaptability, requiring rich scenarios and comprehensive policy repositories to develop such systems. However, existing approaches tend to rely on Large Language Models (LLMs) for user simulation and online learning, leading to biases that diverge from realistic scenarios and result in suboptimal efficiency. Moreover, these methods depend on manually defined, context-independent, coarse-grained policies, which not only incur high expert costs but also raise concerns regarding their completeness. In our work, we highlight the potential for automatically discovering policies directly from raw, real-world dialogue records. To this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Simulation-Free Hierarchical Latent Policy Planning for Proactive Dialogues· underline

Taxonomy

TopicsSpeech and dialogue systems · Topic Modeling