Emphatic TD Bellman Operator is a Contraction

Assaf Hallak; Aviv Tamar; Shie Mannor

arXiv:1508.03411·stat.ML·August 25, 2015·2 cites

Emphatic TD Bellman Operator is a Contraction

Assaf Hallak, Aviv Tamar, Shie Mannor

PDF

Open Access

TL;DR

This paper proves that the emphatic TD Bellman operator is a contraction, enabling the derivation of the first error bounds for off-policy evaluation algorithms under general policies.

Contribution

It demonstrates that the core operator in ETD is a contraction with a specific modulus, leading to new error bounds for off-policy evaluation.

Findings

01

The emphatic TD Bellman operator is a $ ext{sqrt}( ext{gamma})$-contraction.

02

Provides the first error bounds for off-policy evaluation with general policies.

03

Establishes theoretical guarantees for ETD's approximation accuracy.

Abstract

Recently, \citet{SuttonMW15} introduced the emphatic temporal differences (ETD) algorithm for off-policy evaluation in Markov decision processes. In this short note, we show that the projected fixed-point equation that underlies ETD involves a contraction operator, with a $γ$ -contraction modulus (where $γ$ is the discount factor). This allows us to provide error bounds on the approximation error of ETD. To our knowledge, these are the first error bounds for an off-policy evaluation algorithm under general target and behavior policies.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Neural Networks and Applications · Advanced Control Systems Optimization