Universal Approximation Theorem of Deep Q-Networks
Qian Qi

TL;DR
This paper develops a continuous-time stochastic control framework to analyze Deep Q-Networks, proving their ability to approximate optimal Q-functions with high probability and examining convergence properties.
Contribution
It introduces a novel continuous-time analysis of DQNs using stochastic control and FBSDEs, establishing approximation and convergence results in this setting.
Findings
DQNs can approximate the optimal Q-function arbitrarily well on compact sets.
The convergence of Q-learning algorithms for DQNs is analyzed using stochastic approximation.
The analysis highlights the impact of network depth and discretization on DQN performance.
Abstract
We establish a continuous-time framework for analyzing Deep Q-Networks (DQNs) via stochastic control and Forward-Backward Stochastic Differential Equations (FBSDEs). Considering a continuous-time Markov Decision Process (MDP) driven by a square-integrable martingale, we analyze DQN approximation properties. We show that DQNs can approximate the optimal Q-function on compact sets with arbitrary accuracy and high probability, leveraging residual network approximation theorems and large deviation bounds for the state-action process. We then analyze the convergence of a general Q-learning algorithm for training DQNs in this setting, adapting stochastic approximation theorems. Our analysis emphasizes the interplay between DQN layer count, time discretization, and the role of viscosity solutions (primarily for the value function ) in addressing potential non-smoothness of the optimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEnergy Efficient Wireless Sensor Networks
MethodsConvolution · Dense Connections · Deep Q-Network · Q-Learning
