Off-Policy Evaluation for Human Feedback

Qitong Gao; Ge Gao; Juncheng Dong; Vahid Tarokh; Min Chi; Miroslav; Pajic

arXiv:2310.07123·cs.LG·October 17, 2023·2 cites

Off-Policy Evaluation for Human Feedback

Qitong Gao, Ge Gao, Juncheng Dong, Vahid Tarokh, Min Chi, Miroslav, Pajic

PDF

Open Access

TL;DR

This paper introduces OPEHF, a novel off-policy evaluation framework tailored for human feedback signals in reinforcement learning, enhancing accuracy in real-world applications like neurostimulation and tutoring.

Contribution

It develops an IHR reconstruction method with environmental knowledge regularization, improving off-policy evaluation of human feedback signals in RL.

Findings

01

Significant improvement in HF signal estimation accuracy.

02

Effective in real-world neurostimulation and tutoring tasks.

03

Outperforms existing OPE methods in experiments.

Abstract

Off-policy evaluation (OPE) is important for closing the gap between offline training and evaluation of reinforcement learning (RL), by estimating performance and/or rank of target (evaluation) policies using offline trajectories only. It can improve the safety and efficiency of data collection and policy testing procedures in situations where online deployments are expensive, such as healthcare. However, existing OPE methods fall short in estimating human feedback (HF) signals, as HF may be conditioned over multiple underlying factors and is only sparsely available; as opposed to the agent-defined environmental rewards (used in policy optimization), which are usually determined over parametric functions or distributions. Consequently, the nature of HF signals makes extrapolating accurate OPE estimations to be challenging. To resolve this, we introduce an OPE for HF (OPEHF) framework…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEEG and Brain-Computer Interfaces