Stable and Efficient Policy Evaluation

Daoming Lyu; Bo Liu; Matthieu Geist; Wen Dong; Saad Biaz; Qi Wang

arXiv:2006.03978·cs.LG·December 30, 2021

Stable and Efficient Policy Evaluation

Daoming Lyu, Bo Liu, Matthieu Geist, Wen Dong, Saad Biaz, Qi Wang

PDF

Open Access

TL;DR

This paper introduces new policy evaluation algorithms that are both off-policy stable and on-policy efficient, addressing longstanding issues in reinforcement learning prediction tasks.

Contribution

The paper proposes novel algorithms based on oblique projection that simultaneously achieve off-policy stability and on-policy efficiency, a combination not previously available.

Findings

01

Empirical results validate the effectiveness of the proposed algorithms.

02

The new methods outperform traditional TD and gradient TD algorithms.

03

Algorithms demonstrate robustness across various domains.

Abstract

Policy evaluation algorithms are essential to reinforcement learning due to their ability to predict the performance of a policy. However, there are two long-standing issues lying in this prediction problem that need to be tackled: off-policy stability and on-policy efficiency. The conventional temporal difference (TD) algorithm is known to perform very well in the on-policy setting, yet is not off-policy stable. On the other hand, the gradient TD and emphatic TD algorithms are off-policy stable, but are not on-policy efficient. This paper introduces novel algorithms that are both off-policy stable and on-policy efficient by using the oblique projection method. The empirical experimental results on various domains validate the effectiveness of the proposed approach.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdaptive Dynamic Programming Control · Reinforcement Learning in Robotics · Machine Learning and ELM