Sample Complexity of Offline Reinforcement Learning with Deep ReLU   Networks

Thanh Nguyen-Tang; Sunil Gupta; Hung Tran-The; Svetha Venkatesh

arXiv:2103.06671·stat.ML·December 15, 2022·1 cites

Sample Complexity of Offline Reinforcement Learning with Deep ReLU Networks

Thanh Nguyen-Tang, Sunil Gupta, Hung Tran-The, Svetha Venkatesh

PDF

Open Access

TL;DR

This paper provides the first theoretical analysis of the sample complexity for offline reinforcement learning using deep ReLU networks, accounting for complex function regularities and distributional shifts.

Contribution

It establishes a novel sample complexity bound for offline RL with deep ReLU networks under Besov regularity and correlated structures, extending beyond linear models.

Findings

01

Sample complexity bound depends on horizon, dimension, smoothness, and distribution shift.

02

Introduces the Besov dynamic closure and correlated structure concepts for analysis.

03

First theoretical characterization of deep neural network offline RL under general Besov conditions.

Abstract

Offline reinforcement learning (RL) leverages previously collected data for policy optimization without any further active exploration. Despite the recent interest in this problem, its theoretical results in neural network function approximation settings remain elusive. In this paper, we study the statistical theory of offline RL with deep ReLU network function approximation. In particular, we establish the sample complexity of $n = \tilde{O} (H^{4 + 4 \frac{d}{α}} κ_{μ}^{1 + \frac{d}{α}} ϵ^{- 2 - 2 \frac{d}{α}})$ for offline RL with deep ReLU networks, where $κ_{μ}$ is a measure of distributional shift, { $H = (1 - γ)^{- 1}$ is the effective horizon length}, $d$ is the dimension of the state-action space, $α$ is a (possibly fractional) smoothness parameter of the underlying Markov decision process (MDP), and $ϵ$ is a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Advanced Bandit Algorithms Research · Gaussian Processes and Bayesian Inference

Methods*Communicated@Fast*How Do I Communicate to Expedia?