Investigating Reinforcement Learning for Communication Strategies in a Task-Initiative Setting
Baber Khalid, Matthew Stone

TL;DR
This paper explores reinforcement learning for optimizing communication strategies in interactive dialogue systems, focusing on balancing initial information presentation and follow-up clarification, with findings favoring coherence-based representations for their efficiency and explainability.
Contribution
It introduces reinforcement learning approaches for dialogue strategy optimization and demonstrates the effectiveness of coherence-based representations in a referential communication task.
Findings
Coherence-based strategies outperform baseline methods.
Reinforcement learning policies adapt well to different user clarification strategies.
Minimal data is needed for effective coherence-based dialogue management.
Abstract
Many conversational domains require the system to present nuanced information to users. Such systems must follow up what they say to address clarification questions and repair misunderstandings. In this work, we explore this interactive strategy in a referential communication task. Using simulation, we analyze the communication trade-offs between initial presentation and subsequent followup as a function of user clarification strategy, and compare the performance of several baseline strategies to policies derived by reinforcement learning. We find surprising advantages to coherence-based representations of dialogue strategy, which bring minimal data requirements, explainable choices, and strong audit capabilities, but incur little loss in predicted outcomes across a wide range of user models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Topic Modeling · Personal Information Management and User Behavior
MethodsRepair
