Constrained Policy Optimization for Controlled Self-Learning in   Conversational AI Systems

Mohammad Kachuee; Sungjin Lee

arXiv:2209.08429·cs.LG·May 16, 2023

Constrained Policy Optimization for Controlled Self-Learning in Conversational AI Systems

Mohammad Kachuee, Sungjin Lee

PDF

Open Access

TL;DR

This paper introduces a scalable constrained policy optimization framework for conversational AI that balances user satisfaction improvements with domain-specific safety constraints, using a novel meta-gradient approach.

Contribution

It proposes a new meta-gradient learning method for adaptive constraint satisfaction in domain-specific conversational AI policy optimization.

Findings

01

Achieves a better balance between policy value and constraint satisfaction.

02

Demonstrates effectiveness on real-world conversational AI data.

03

Outperforms existing methods in constraint adherence and user satisfaction.

Abstract

Recently, self-learning methods based on user satisfaction metrics and contextual bandits have shown promising results to enable consistent improvements in conversational AI systems. However, directly targeting such metrics by off-policy bandit learning objectives often increases the risk of making abrupt policy changes that break the current user experience. In this study, we introduce a scalable framework for supporting fine-grained exploration targets for individual domains via user-defined constraints. For example, we may want to ensure fewer policy deviations in business-critical domains such as shopping, while allocating more exploration budget to domains such as music. Furthermore, we present a novel meta-gradient learning approach that is scalable and practical to address this problem. The proposed method adjusts constraint violation penalty terms adaptively through a meta…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Bandit Algorithms Research · Recommender Systems and Techniques

MethodsSelf-Learning