Universal Approximation Theorem for Deep Q-Learning via FBSDE System

Qian Qi

arXiv:2505.06023·cs.LG·May 12, 2025

Universal Approximation Theorem for Deep Q-Learning via FBSDE System

Qian Qi

PDF

Open Access

TL;DR

This paper proves a universal approximation theorem for deep Q-networks by leveraging the Bellman equation's structure and regularity properties, connecting network depth to value function refinement in control problems.

Contribution

It establishes a UAT for DQNs that emulate Bellman updates, utilizing regularity propagation and neural operators to link network architecture with dynamic programming.

Findings

01

Deep residual networks can approximate Bellman operators.

02

Network depth corresponds to value iteration steps.

03

Regularity of value functions is preserved through iterations.

Abstract

The approximation capabilities of Deep Q-Networks (DQNs) are commonly justified by general Universal Approximation Theorems (UATs) that do not leverage the intrinsic structural properties of the optimal Q-function, the solution to a Bellman equation. This paper establishes a UAT for a class of DQNs whose architecture is designed to emulate the iterative refinement process inherent in Bellman updates. A central element of our analysis is the propagation of regularity: while the transformation induced by a single Bellman operator application exhibits regularity, for which Backward Stochastic Differential Equations (BSDEs) theory provides analytical tools, the uniform regularity of the entire sequence of value iteration iterates--specifically, their uniform Lipschitz continuity on compact domains under standard Lipschitz assumptions on the problem data--is derived from finite-horizon…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Risk and Portfolio Optimization · Adversarial Robustness in Machine Learning