Offline Reinforcement Learning for Human-Guided Human-Machine Interaction with Private Information
Zuyue Fu, Zhengling Qi, Zhuoran Yang, Zhaoran Wang, Lan Wang

TL;DR
This paper develops an offline reinforcement learning framework for human-guided human-machine interactions involving private information, addressing confounding bias and distribution mismatch to optimize policies in a two-player game setting.
Contribution
It introduces a novel instrumental variable approach to handle confounding bias and proposes an off-policy evaluation and learning algorithm leveraging pessimism for two-player offline RL with private info.
Findings
The proposed method effectively adjusts for unmeasured confounding bias.
The off-policy learning algorithm converges to the optimal policy pair under mild conditions.
The approach demonstrates promising potential for privacy-preserving human-machine interaction optimization.
Abstract
Motivated by the human-machine interaction such as training chatbots for improving customer satisfaction, we study human-guided human-machine interaction involving private information. We model this interaction as a two-player turn-based game, where one player (Alice, a human) guides the other player (Bob, a machine) towards a common goal. Specifically, we focus on offline reinforcement learning (RL) in this game, where the goal is to find a policy pair for Alice and Bob that maximizes their expected total rewards based on an offline dataset collected a priori. The offline setting presents two challenges: (i) We cannot collect Bob's private information, leading to a confounding bias when using standard RL methods, and (ii) a distributional mismatch between the behavior policy used to collect data and the desired policy we aim to learn. To tackle the confounding bias, we treat Bob's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Mobile Crowdsensing and Crowdsourcing · Ethics and Social Impacts of AI
