Hybrid Value Estimation for Off-policy Evaluation and Offline   Reinforcement Learning

Xue-Kun Jin; Xu-Hui Liu; Shengyi Jiang; Yang Yu

arXiv:2206.02000·cs.LG·June 7, 2022·1 cites

Hybrid Value Estimation for Off-policy Evaluation and Offline Reinforcement Learning

Xue-Kun Jin, Xu-Hui Liu, Shengyi Jiang, Yang Yu

PDF

Open Access

TL;DR

This paper introduces Hybrid Value Estimation (HVE), a method that improves value estimation accuracy in offline reinforcement learning by balancing bias and variance, leading to better evaluation and learning performance.

Contribution

The paper proposes HVE, a novel approach that combines offline data and learned models for improved value estimation, along with two algorithms OPHVE and MOHVE for evaluation and learning.

Findings

01

OPHVE outperforms existing off-policy evaluation methods.

02

MOHVE achieves competitive results with state-of-the-art offline RL algorithms.

03

Empirical results on MuJoCo validate the theoretical advantages of HVE.

Abstract

Value function estimation is an indispensable subroutine in reinforcement learning, which becomes more challenging in the offline setting. In this paper, we propose Hybrid Value Estimation (HVE) to reduce value estimation error, which trades off bias and variance by balancing between the value estimation from offline data and the learned model. Theoretical analysis discloses that HVE enjoys a better error bound than the direct methods. HVE can be leveraged in both off-policy evaluation and offline reinforcement learning settings. We, therefore, provide two concrete algorithms Off-policy HVE (OPHVE) and Model-based Offline HVE (MOHVE), respectively. Empirical evaluations on MuJoCo tasks corroborate the theoretical claim. OPHVE outperforms other off-policy evaluation methods in all three metrics measuring the estimation effectiveness, while MOHVE achieves better or comparable performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Muscle activation and electromyography studies