On the Convergence and Sample Complexity Analysis of Deep Q-Networks with $\epsilon$-Greedy Exploration
Shuai Zhang, Hongkang Li, Meng Wang, Miao Liu, Pin-Yu Chen, Songtao, Lu, Sijia Liu, Keerthiram Murugesan, Subhajit Chaudhury

TL;DR
This paper offers the first theoretical convergence and sample complexity analysis of Deep Q-Networks with epsilon-greedy exploration, explaining how exploration parameters influence learning efficiency.
Contribution
It provides a novel theoretical framework analyzing DQNs with practical epsilon-greedy policies, including convergence rates and exploration effects.
Findings
Decaying epsilon leads to geometric convergence to the optimal Q-value.
Higher epsilon values expand the convergence region but slow down learning.
Experimental results support the theoretical analysis.
Abstract
This paper provides a theoretical understanding of Deep Q-Network (DQN) with the -greedy exploration in deep reinforcement learning. Despite the tremendous empirical achievement of the DQN, its theoretical characterization remains underexplored. First, the exploration strategy is either impractical or ignored in the existing analysis. Second, in contrast to conventional Q-learning algorithms, the DQN employs the target network and experience replay to acquire an unbiased estimation of the mean-square Bellman error (MSBE) utilized in training the Q-network. However, the existing theoretical analysis of DQNs lacks convergence analysis or bypasses the technical challenges by deploying a significantly overparameterized neural network, which is not computationally efficient. This paper provides the first theoretical convergence and sample complexity analysis of the practical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices
MethodsDense Connections · Experience Replay · Q-Learning · Convolution · Deep Q-Network
