When is Realizability Sufficient for Off-Policy Reinforcement Learning?

Andrea Zanette

arXiv:2211.05311·cs.LG·June 7, 2023

When is Realizability Sufficient for Off-Policy Reinforcement Learning?

Andrea Zanette

PDF

Open Access 1 Video

TL;DR

This paper investigates the conditions under which off-policy reinforcement learning can succeed without Bellman completeness, focusing on realizability and introducing new bounds that account for Bellman mis-alignment.

Contribution

It relaxes the Bellman completeness assumption, providing finite-sample guarantees based on realizability and a new measure of Bellman mis-alignment.

Findings

01

Off-policy RL can be statistically viable without Bellman completeness.

02

New bounds depend on function class complexity, concentrability, and Bellman mis-alignment.

03

Analysis applies to temporal difference algorithms when they converge.

Abstract

Model-free algorithms for reinforcement learning typically require a condition called Bellman completeness in order to successfully operate off-policy with function approximation, unless additional conditions are met. However, Bellman completeness is a requirement that is much stronger than realizability and that is deemed to be too strong to hold in practice. In this work, we relax this structural assumption and analyze the statistical complexity of off-policy reinforcement learning when only realizability holds for the prescribed function class. We establish finite-sample guarantees for off-policy reinforcement learning that are free of the approximation error term known as inherent Bellman error, and that depend on the interplay of three factors. The first two are well known: they are the metric entropy of the function class and the concentrability coefficient that represents the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

When is Realizability Sufficient for Off-Policy Reinforcement Learning?· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Gene Regulatory Network Analysis