On the Convergence and Sample Complexity Analysis of Deep Q-Networks   with $\epsilon$-Greedy Exploration

Shuai Zhang; Hongkang Li; Meng Wang; Miao Liu; Pin-Yu Chen; Songtao; Lu; Sijia Liu; Keerthiram Murugesan; Subhajit Chaudhury

arXiv:2310.16173·cs.LG·October 26, 2023·1 cites

On the Convergence and Sample Complexity Analysis of Deep Q-Networks with $\epsilon$-Greedy Exploration

Shuai Zhang, Hongkang Li, Meng Wang, Miao Liu, Pin-Yu Chen, Songtao, Lu, Sijia Liu, Keerthiram Murugesan, Subhajit Chaudhury

PDF

Open Access 1 Video

TL;DR

This paper offers the first theoretical convergence and sample complexity analysis of Deep Q-Networks with epsilon-greedy exploration, explaining how exploration parameters influence learning efficiency.

Contribution

It provides a novel theoretical framework analyzing DQNs with practical epsilon-greedy policies, including convergence rates and exploration effects.

Findings

01

Decaying epsilon leads to geometric convergence to the optimal Q-value.

02

Higher epsilon values expand the convergence region but slow down learning.

03

Experimental results support the theoretical analysis.

Abstract

This paper provides a theoretical understanding of Deep Q-Network (DQN) with the $ε$ -greedy exploration in deep reinforcement learning. Despite the tremendous empirical achievement of the DQN, its theoretical characterization remains underexplored. First, the exploration strategy is either impractical or ignored in the existing analysis. Second, in contrast to conventional Q-learning algorithms, the DQN employs the target network and experience replay to acquire an unbiased estimation of the mean-square Bellman error (MSBE) utilized in training the Q-network. However, the existing theoretical analysis of DQNs lacks convergence analysis or bypasses the technical challenges by deploying a significantly overparameterized neural network, which is not computationally efficient. This paper provides the first theoretical convergence and sample complexity analysis of the practical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

On the Convergence and Sample Complexity Analysis of Deep Q-Networks with $\epsilon$-Greedy Exploration· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices

MethodsDense Connections · Experience Replay · Q-Learning · Convolution · Deep Q-Network