Offline Reinforcement Learning for Human-Guided Human-Machine   Interaction with Private Information

Zuyue Fu; Zhengling Qi; Zhuoran Yang; Zhaoran Wang; Lan Wang

arXiv:2212.12167·stat.ML·December 26, 2022

Offline Reinforcement Learning for Human-Guided Human-Machine Interaction with Private Information

Zuyue Fu, Zhengling Qi, Zhuoran Yang, Zhaoran Wang, Lan Wang

PDF

Open Access

TL;DR

This paper develops an offline reinforcement learning framework for human-guided human-machine interactions involving private information, addressing confounding bias and distribution mismatch to optimize policies in a two-player game setting.

Contribution

It introduces a novel instrumental variable approach to handle confounding bias and proposes an off-policy evaluation and learning algorithm leveraging pessimism for two-player offline RL with private info.

Findings

01

The proposed method effectively adjusts for unmeasured confounding bias.

02

The off-policy learning algorithm converges to the optimal policy pair under mild conditions.

03

The approach demonstrates promising potential for privacy-preserving human-machine interaction optimization.

Abstract

Motivated by the human-machine interaction such as training chatbots for improving customer satisfaction, we study human-guided human-machine interaction involving private information. We model this interaction as a two-player turn-based game, where one player (Alice, a human) guides the other player (Bob, a machine) towards a common goal. Specifically, we focus on offline reinforcement learning (RL) in this game, where the goal is to find a policy pair for Alice and Bob that maximizes their expected total rewards based on an offline dataset collected a priori. The offline setting presents two challenges: (i) We cannot collect Bob's private information, leading to a confounding bias when using standard RL methods, and (ii) a distributional mismatch between the behavior policy used to collect data and the desired policy we aim to learn. To tackle the confounding bias, we treat Bob's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Mobile Crowdsensing and Crowdsourcing · Ethics and Social Impacts of AI