The Optimal Unbiased Value Estimator and its Relation to LSTD, TD and MC

Steffen Gr\"unew\"alder; Klaus Obermayer

arXiv:0908.3458·stat.ML·August 25, 2009·Mach. Learn.

The Optimal Unbiased Value Estimator and its Relation to LSTD, TD and MC

Steffen Gr\"unew\"alder, Klaus Obermayer

PDF

Open Access

TL;DR

This paper analytically derives the optimal unbiased value estimator (MVU), compares it with TD, MC, and LSTD, and explores their biases and relations in different Markov Reward Process structures.

Contribution

It provides a theoretical analysis of the MVU, establishes its relation to LSTD, TD, and MC, and clarifies conditions for unbiasedness and estimator risk ordering.

Findings

01

LSTD is equivalent to MVU in acyclic MRPs.

02

MC equals MVU and LSTD in undiscounted MRPs with equal information.

03

TD is unbiased in acyclic MRPs and biased in cyclic MRPs.

Abstract

In this analytical study we derive the optimal unbiased value estimator (MVU) and compare its statistical risk to three well known value estimators: Temporal Difference learning (TD), Monte Carlo estimation (MC) and Least-Squares Temporal Difference Learning (LSTD). We demonstrate that LSTD is equivalent to the MVU if the Markov Reward Process (MRP) is acyclic and show that both differ for most cyclic MRPs as LSTD is then typically biased. More generally, we show that estimators that fulfill the Bellman equation can only be unbiased for special cyclic MRPs. The main reason being the probability measures with which the expectations are taken. These measure vary from state to state and due to the strong coupling by the Bellman equation it is typically not possible for a set of value estimators to be unbiased with respect to each of these measures. Furthermore, we derive relations of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Control Systems Optimization · Control Systems and Identification · Advanced Multi-Objective Optimization Algorithms