Understanding Deep Neural Function Approximation in Reinforcement   Learning via $\epsilon$-Greedy Exploration

Fanghui Liu; Luca Viano; Volkan Cevher

arXiv:2209.07376·cs.LG·October 18, 2022·1 cites

Understanding Deep Neural Function Approximation in Reinforcement Learning via $\epsilon$-Greedy Exploration

Fanghui Liu, Luca Viano, Volkan Cevher

PDF

Open Access 1 Video

TL;DR

This paper offers a theoretical analysis of deep neural network function approximation in reinforcement learning with epsilon-greedy exploration, focusing on neural architecture scaling and regret bounds.

Contribution

It provides the first theoretical insights into deep RL with epsilon-greedy exploration, analyzing neural network architecture requirements beyond linear models.

Findings

01

Scaling width as (T^{d/(2\u03b1+d)}) is sufficient for deep RL.

02

Scaling depth as (\,log T) suffices for deep neural networks.

03

Width (\,\, ext{sqrt}(T)) is enough for two-layer Barron space networks.

Abstract

This paper provides a theoretical study of deep neural function approximation in reinforcement learning (RL) with the $ϵ$ -greedy exploration under the online setting. This problem setting is motivated by the successful deep Q-networks (DQN) framework that falls in this regime. In this work, we provide an initial attempt on theoretical understanding deep RL from the perspective of function class and neural networks architectures (e.g., width and depth) beyond the ``linear'' regime. To be specific, we focus on the value based algorithm with the $ϵ$ -greedy exploration via deep (and two-layer) neural networks endowed by Besov (and Barron) function spaces, respectively, which aims at approximating an $α$ -smooth Q-function in a $d$ -dimensional feature space. We prove that, with $T$ episodes, scaling the width $m = O (T^{\frac{d}{2 α + d}})$ and the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Understanding Deep Neural Function Approximation in Reinforcement Learning via $\epsilon$-Greedy Exploration· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Reinforcement Learning in Robotics