Sample Complexity of Nonparametric Off-Policy Evaluation on   Low-Dimensional Manifolds using Deep Networks

Xiang Ji; Minshuo Chen; Mengdi Wang; Tuo Zhao

arXiv:2206.02887·cs.LG·October 5, 2022·1 cites

Sample Complexity of Nonparametric Off-Policy Evaluation on Low-Dimensional Manifolds using Deep Networks

Xiang Ji, Minshuo Chen, Mengdi Wang, Tuo Zhao

PDF

Open Access 1 Video

TL;DR

This paper demonstrates that deep neural networks can efficiently evaluate policies in reinforcement learning by exploiting low-dimensional manifold structures, leading to sample-efficient estimators with theoretical guarantees.

Contribution

It introduces a sharp error bound for off-policy evaluation using deep networks that leverages intrinsic low-dimensional structures and a novel CNN approximation result.

Findings

01

Error bound depends on intrinsic dimension and policy mismatch

02

Sample efficiency achieved by exploiting manifold structure

03

CNN approximation results support theoretical analysis

Abstract

We consider the off-policy evaluation problem of reinforcement learning using deep convolutional neural networks. We analyze the deep fitted Q-evaluation method for estimating the expected cumulative reward of a target policy, when the data are generated from an unknown behavior policy. We show that, by choosing network size appropriately, one can leverage any low-dimensional manifold structure in the Markov decision process and obtain a sample-efficient estimator without suffering from the curse of high data ambient dimensionality. Specifically, we establish a sharp error bound for fitted Q-evaluation, which depends on the intrinsic dimension of the state-action space, the smoothness of Bellman operator, and a function class-restricted $χ^{2}$ -divergence. It is noteworthy that the restricted $χ^{2}$ -divergence measures the behavior and target policies' {\it mismatch in the function…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Sample Complexity of Nonparametric Off-Policy Evaluation on Low-Dimensional Manifolds using Deep Networks· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Age of Information Optimization