Deep neural networks can provably solve Bellman equations for Markov decision processes without the curse of dimensionality

Arnulf Jentzen; Konrad Kleinberg; Thomas Kruse

arXiv:2506.22851·math.OC·July 1, 2025

Deep neural networks can provably solve Bellman equations for Markov decision processes without the curse of dimensionality

Arnulf Jentzen, Konrad Kleinberg, Thomas Kruse

PDF

Open Access

TL;DR

This paper demonstrates that deep neural networks can efficiently approximate solutions to Bellman equations in high-dimensional Markov decision processes, avoiding the curse of dimensionality under certain approximation conditions.

Contribution

The authors establish a polynomial growth bound on neural network parameters for approximating Bellman equation solutions in high-dimensional MDPs, using the MLFP scheme.

Findings

01

Neural networks can approximate Q-functions with polynomially growing parameters.

02

The approach avoids the curse of dimensionality in solving Bellman equations.

03

The method applies to MDPs with infinite horizon and finite control sets.

Abstract

Discrete time stochastic optimal control problems and Markov decision processes (MDPs) are fundamental models for sequential decision-making under uncertainty and as such provide the mathematical framework underlying reinforcement learning theory. A central tool for solving MDPs is the Bellman equation and its solution, the so-called $Q$ -function. In this article, we construct deep neural network (DNN) approximations for $Q$ -functions associated to MDPs with infinite time horizon and finite control set $A$ . More specifically, we show that if the the payoff function and the random transition dynamics of the MDP can be suitably approximated by DNNs with leaky rectified linear unit (ReLU) activation, then the solutions $Q_{d} : R^{d} \to R^{∣ A ∣}$ , $d \in N$ , of the associated Bellman equations can also be approximated in the $L^{2}$ -sense by DNNs with leaky ReLU…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Risk and Portfolio Optimization