Approximate discounting-free policy evaluation from transient and   recurrent states

Vektor Dewanto; Marcus Gallagher

arXiv:2204.04324·cs.LG·April 12, 2022

Approximate discounting-free policy evaluation from transient and recurrent states

Vektor Dewanto, Marcus Gallagher

PDF

Open Access

TL;DR

This paper introduces a novel approach for policy evaluation in reinforcement learning that accurately estimates bias from both transient and recurrent states using a seminorm LSTD method, enhancing model-free learning.

Contribution

It develops a new bias approximation system for transient states and derives a seminorm LSTD method enabling sampling-based, model-free policy evaluation.

Findings

01

Effective bias estimation from transient states demonstrated

02

Seminorm LSTD provides a unifying framework for policy evaluation

03

Experimental results confirm the method's validity

Abstract

In order to distinguish policies that prescribe good from bad actions in transient states, we need to evaluate the so-called bias of a policy from transient states. However, we observe that most (if not all) works in approximate discounting-free policy evaluation thus far are developed for estimating the bias solely from recurrent states. We therefore propose a system of approximators for the bias (specifically, its relative value) from transient and recurrent states. Its key ingredient is a seminorm LSTD (least-squares temporal difference), for which we derive its minimizer expression that enables approximation by sampling required in model-free reinforcement learning. This seminorm LSTD also facilitates the formulation of a general unifying procedure for LSTD-based policy value approximators. Experimental results validate the effectiveness of our proposed method.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Smart Grid Energy Management