When is Realizability Sufficient for Off-Policy Reinforcement Learning?
Andrea Zanette

TL;DR
This paper investigates the conditions under which off-policy reinforcement learning can succeed without Bellman completeness, focusing on realizability and introducing new bounds that account for Bellman mis-alignment.
Contribution
It relaxes the Bellman completeness assumption, providing finite-sample guarantees based on realizability and a new measure of Bellman mis-alignment.
Findings
Off-policy RL can be statistically viable without Bellman completeness.
New bounds depend on function class complexity, concentrability, and Bellman mis-alignment.
Analysis applies to temporal difference algorithms when they converge.
Abstract
Model-free algorithms for reinforcement learning typically require a condition called Bellman completeness in order to successfully operate off-policy with function approximation, unless additional conditions are met. However, Bellman completeness is a requirement that is much stronger than realizability and that is deemed to be too strong to hold in practice. In this work, we relax this structural assumption and analyze the statistical complexity of off-policy reinforcement learning when only realizability holds for the prescribed function class. We establish finite-sample guarantees for off-policy reinforcement learning that are free of the approximation error term known as inherent Bellman error, and that depend on the interplay of three factors. The first two are well known: they are the metric entropy of the function class and the concentrability coefficient that represents the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Gene Regulatory Network Analysis
