A proof that rectified deep neural networks overcome the curse of   dimensionality in the numerical approximation of semilinear heat equations

Martin Hutzenthaler; Arnulf Jentzen; Thomas Kruse; Tuan Anh Nguyen

arXiv:1901.10854·math.NA·November 25, 2020

A proof that rectified deep neural networks overcome the curse of dimensionality in the numerical approximation of semilinear heat equations

Martin Hutzenthaler, Arnulf Jentzen, Thomas Kruse, Tuan Anh Nguyen

PDF

TL;DR

This paper rigorously proves that deep neural networks can efficiently approximate solutions to high-dimensional semilinear heat equations, overcoming the curse of dimensionality with polynomial growth in parameters.

Contribution

It provides the first mathematical proof that deep neural networks overcome the curse of dimensionality for a class of nonlinear PDEs, specifically semilinear heat equations.

Findings

01

Neural network parameters grow polynomially with dimension and accuracy

02

Proof relies on multilevel Picard approximations

03

Overcomes the curse of dimensionality in nonlinear PDE approximation

Abstract

Deep neural networks and other deep learning methods have very successfully been applied to the numerical approximation of high-dimensional nonlinear parabolic partial differential equations (PDEs), which are widely used in finance, engineering, and natural sciences. In particular, simulations indicate that algorithms based on deep learning overcome the curse of dimensionality in the numerical approximation of solutions of semilinear PDEs. For certain linear PDEs this has also been proved mathematically. The key contribution of this article is to rigorously prove this for the first time for a class of nonlinear PDEs. More precisely, we prove in the case of semilinear heat equations with gradient-independent nonlinearities that the numbers of parameters of the employed deep neural networks grow at most polynomially in both the PDE dimension and the reciprocal of the prescribed…

Equations190

R (Φ) \in C (\mathbbm R^{k_{0}}, \mathbbm R^{k_{H + 1}}), (R (Φ)) (x_{0}) = W_{H + 1} x_{H} + B_{H + 1}, and P (Φ) = n = 1 \sum H + 1 k_{n} (k_{n - 1} + 1),

R (Φ) \in C (\mathbbm R^{k_{0}}, \mathbbm R^{k_{H + 1}}), (R (Φ)) (x_{0}) = W_{H + 1} x_{H} + B_{H + 1}, and P (Φ) = n = 1 \sum H + 1 k_{n} (k_{n - 1} + 1),

(\frac{\partial}{\partial t} u_{d}) (t, x) = (Δ_{x} u_{d}) (t, x) + f (u_{d} (t, x)) .

(\frac{\partial}{\partial t} u_{d}) (t, x) = (Δ_{x} u_{d}) (t, x) + f (u_{d} (t, x)) .

[\int_{[0, 1]^{d}} ∣ u_{d} (T, x) - (R (Ψ_{d, ε})) (x) ∣^{2} d x]^{\nicefrac 12} \leq ε .

[\int_{[0, 1]^{d}} ∣ u_{d} (T, x) - (R (Ψ_{d, ε})) (x) ∣^{2} d x]^{\nicefrac 12} \leq ε .

∣ f_{i} (t, x, w) - f_{i} (t, x, v) ∣ \leq L ∣ w - v ∣,

∣ f_{i} (t, x, w) - f_{i} (t, x, v) ∣ \leq L ∣ w - v ∣,

\displaystyle\max\left\{\big{.}\!\left|f_{i}(t,x,0)\right|,\left|g_{i}(x)\right|\right\}\leq B\left(1+\left\|{x}\right\|\big{.}\!\right)^{p},

\displaystyle\max\left\{\big{.}\!\left|f_{i}(t,x,0)\right|,\left|g_{i}(x)\right|\right\}\leq B\left(1+\left\|{x}\right\|\big{.}\!\right)^{p},

\displaystyle\max\left\{\big{.}\!\left|f_{1}(t,x,v)-f_{2}(t,x,v)\right|,\left|g_{1}(x)-g_{2}(x)\right|\right\}\leq\delta\left(\Big{.}\!\left(\big{.}1+\left\|{x}\right\|\right)^{pq}+|v|^{q}\big{.}\right),

\displaystyle\max\left\{\big{.}\!\left|f_{1}(t,x,v)-f_{2}(t,x,v)\right|,\left|g_{1}(x)-g_{2}(x)\right|\right\}\leq\delta\left(\Big{.}\!\left(\big{.}1+\left\|{x}\right\|\right)^{pq}+|v|^{q}\big{.}\right),

(F_{i} (v)) (t, x) = f_{i} (t, x, v (t, x)),

(F_{i} (v)) (t, x) = f_{i} (t, x, v (t, x)),

\displaystyle{\mathbb{E}}\!\left[\left|g_{i}\left(x+\mathbf{W}_{T-s}\right)\big{.}\!\right|+\int_{s}^{T}\left|\left(F_{i}(u_{i})\right)\left(t,x+\mathbf{W}_{t-s}\right)\right|\,dt\right]<\infty

\displaystyle{\mathbb{E}}\!\left[\left|g_{i}\left(x+\mathbf{W}_{T-s}\right)\big{.}\!\right|+\int_{s}^{T}\left|\left(F_{i}(u_{i})\right)\left(t,x+\mathbf{W}_{t-s}\right)\right|\,dt\right]<\infty

u_{i} (s, x) = E [g_{i} (x + W_{T - s}) + \int_{s}^{T} (F_{i} (u_{i})) (t, x + W_{t - s}) d t],

u_{i} (s, x) = E [g_{i} (x + W_{T - s}) + \int_{s}^{T} (F_{i} (u_{i})) (t, x + W_{t - s}) d t],

\begin{split}&{U}_{n,M}^{\theta}(t,x)=\frac{1}{M^{n}}\sum_{i=1}^{M^{n}}g_{2}\left(x+W^{(\theta,0,-i)}_{T}-W^{(\theta,0,-i)}_{t}\right)\\ +&\sum_{l=0}^{n-1}\frac{(T-t)}{M^{n-l}}\left[\sum_{i=1}^{M^{n-l}}\left(F_{2}\big{(}{U}_{l,M}^{(\theta,l,i)}\big{)}-\mathbbm{1}_{{\mathbbm{N}}}(l)F_{2}\big{(}{U}_{l-1,M}^{(\theta,-l,i)}\big{)}\right)\left(\mathcal{U}_{t}^{(\theta,l,i)},x+W_{\mathcal{U}_{t}^{(\theta,l,i)}}^{(\theta,l,i)}-W_{t}^{(\theta,l,i)}\right)\right].\end{split}

\begin{split}&{U}_{n,M}^{\theta}(t,x)=\frac{1}{M^{n}}\sum_{i=1}^{M^{n}}g_{2}\left(x+W^{(\theta,0,-i)}_{T}-W^{(\theta,0,-i)}_{t}\right)\\ +&\sum_{l=0}^{n-1}\frac{(T-t)}{M^{n-l}}\left[\sum_{i=1}^{M^{n-l}}\left(F_{2}\big{(}{U}_{l,M}^{(\theta,l,i)}\big{)}-\mathbbm{1}_{{\mathbbm{N}}}(l)F_{2}\big{(}{U}_{l-1,M}^{(\theta,-l,i)}\big{)}\right)\left(\mathcal{U}_{t}^{(\theta,l,i)},x+W_{\mathcal{U}_{t}^{(\theta,l,i)}}^{(\theta,l,i)}-W_{t}^{(\theta,l,i)}\right)\right].\end{split}

\begin{split}\sup_{t\in[0,T]}\left(\bigg{.}{\mathbb{E}}\!\left[\Big{.}\!\left|u_{i}(t,x+\mathbf{W}_{t})\right|^{q}\right]\right)^{\!\!\nicefrac{{1}}{{q}}}\leq e^{LT}(T+1)B\left[\sup_{t\in[0,T]}\left({\mathbb{E}}\!\left[\Big{.}\!\left(1+\left\|{x+\mathbf{W}_{t}}\right\|\Big{.}\right)^{pq}\bigg{.}\!\right]\Bigg{.}\!\right)^{\!\nicefrac{{1}}{{q}}}\right].\end{split}

\begin{split}\sup_{t\in[0,T]}\left(\bigg{.}{\mathbb{E}}\!\left[\Big{.}\!\left|u_{i}(t,x+\mathbf{W}_{t})\right|^{q}\right]\right)^{\!\!\nicefrac{{1}}{{q}}}\leq e^{LT}(T+1)B\left[\sup_{t\in[0,T]}\left({\mathbb{E}}\!\left[\Big{.}\!\left(1+\left\|{x+\mathbf{W}_{t}}\right\|\Big{.}\right)^{pq}\bigg{.}\!\right]\Bigg{.}\!\right)^{\!\nicefrac{{1}}{{q}}}\right].\end{split}

μ_{t} (B) = P (x + W_{t} \in B) .

μ_{t} (B) = P (x + W_{t} \in B) .

\begin{split}&\left(\bigg{.}{\mathbb{E}}\!\left[\Big{.}|u_{i}(t,x+\mathbf{W}_{t})|^{q}\right]\right)^{\!\nicefrac{{1}}{{q}}}=\left(\int_{{\mathbbm{R}}^{d}}|u_{i}(t,z)|^{q}\,\mu_{t}(dz)\right)^{\!\nicefrac{{1}}{{q}}}\\ &=\left(\int_{{\mathbbm{R}}^{d}}\left|{\mathbb{E}}\!\left[g_{i}(z+\mathbf{W}_{T-t})+\int_{t}^{T}(F_{i}(u_{i}))(s,z+\mathbf{W}_{s-t})\,ds\right]\right|^{q}\,\mu_{t}(dz)\right)^{\!\!\nicefrac{{1}}{{q}}}\\ &\leq\left(\int_{{\mathbbm{R}}^{d}}\left|{\mathbb{E}}\!\left[\big{.}g_{i}(z+\mathbf{W}_{T-t})\right]\right|^{q}\,\mu_{t}(dz)\right)^{\!\nicefrac{{1}}{{q}}}\\ &\qquad+\int_{t}^{T}\left(\int_{{\mathbbm{R}}^{d}}\left|{\mathbb{E}}\!\left[(F_{i}(u_{i}))(s,z+\mathbf{W}_{s-t})\big{.}\right]\right|^{q}\Bigg{.}\!\,\mu_{t}(dz)\right)^{\!\!\nicefrac{{1}}{{q}}}ds.\end{split}

\begin{split}&\left(\bigg{.}{\mathbb{E}}\!\left[\Big{.}|u_{i}(t,x+\mathbf{W}_{t})|^{q}\right]\right)^{\!\nicefrac{{1}}{{q}}}=\left(\int_{{\mathbbm{R}}^{d}}|u_{i}(t,z)|^{q}\,\mu_{t}(dz)\right)^{\!\nicefrac{{1}}{{q}}}\\ &=\left(\int_{{\mathbbm{R}}^{d}}\left|{\mathbb{E}}\!\left[g_{i}(z+\mathbf{W}_{T-t})+\int_{t}^{T}(F_{i}(u_{i}))(s,z+\mathbf{W}_{s-t})\,ds\right]\right|^{q}\,\mu_{t}(dz)\right)^{\!\!\nicefrac{{1}}{{q}}}\\ &\leq\left(\int_{{\mathbbm{R}}^{d}}\left|{\mathbb{E}}\!\left[\big{.}g_{i}(z+\mathbf{W}_{T-t})\right]\right|^{q}\,\mu_{t}(dz)\right)^{\!\nicefrac{{1}}{{q}}}\\ &\qquad+\int_{t}^{T}\left(\int_{{\mathbbm{R}}^{d}}\left|{\mathbb{E}}\!\left[(F_{i}(u_{i}))(s,z+\mathbf{W}_{s-t})\big{.}\right]\right|^{q}\Bigg{.}\!\,\mu_{t}(dz)\right)^{\!\!\nicefrac{{1}}{{q}}}ds.\end{split}

\begin{split}&\int_{{\mathbbm{R}}^{d}}\left|{\mathbb{E}}\!\left[\big{.}g_{i}(z+\mathbf{W}_{T-t})\right]\right|^{q}\,\mu_{t}(dz)\leq\int_{{\mathbbm{R}}^{d}}{\mathbb{E}}\!\left[\left|g_{i}(z+\mathbf{W}_{T}-\mathbf{W}_{t})\right|^{q}\Big{.}\!\right]\,\mu_{t}(dz)\\ &={\mathbb{E}}\!\left[\left|g_{i}\left(x+\mathbf{W}_{t}+\mathbf{W}_{T}-\mathbf{W}_{t}\right)\right|^{q}\Big{.}\!\right]={\mathbb{E}}\!\left[\left|g_{i}\left(x+\mathbf{W}_{T}\right)\right|^{q}\Big{.}\!\right]\leq{\mathbb{E}}\!\left[B^{q}\left(1+\left\|{x+\mathbf{W}_{T}}\right\|\Big{.}\!\right)^{pq}\bigg{.}\!\right].\end{split}

\begin{split}&\int_{{\mathbbm{R}}^{d}}\left|{\mathbb{E}}\!\left[\big{.}g_{i}(z+\mathbf{W}_{T-t})\right]\right|^{q}\,\mu_{t}(dz)\leq\int_{{\mathbbm{R}}^{d}}{\mathbb{E}}\!\left[\left|g_{i}(z+\mathbf{W}_{T}-\mathbf{W}_{t})\right|^{q}\Big{.}\!\right]\,\mu_{t}(dz)\\ &={\mathbb{E}}\!\left[\left|g_{i}\left(x+\mathbf{W}_{t}+\mathbf{W}_{T}-\mathbf{W}_{t}\right)\right|^{q}\Big{.}\!\right]={\mathbb{E}}\!\left[\left|g_{i}\left(x+\mathbf{W}_{T}\right)\right|^{q}\Big{.}\!\right]\leq{\mathbb{E}}\!\left[B^{q}\left(1+\left\|{x+\mathbf{W}_{T}}\right\|\Big{.}\!\right)^{pq}\bigg{.}\!\right].\end{split}

\displaystyle\begin{split}&\int_{t}^{T}\left(\int_{{\mathbbm{R}}^{d}}\left|{\mathbb{E}}\!\left[\big{.}\!(F_{i}(u_{i}))(s,z+\mathbf{W}_{s-t})\right]\right|^{q}\,\mu_{t}(dz)\right)^{\!\!\nicefrac{{1}}{{q}}}ds\\ &\leq\int_{t}^{T}\left(\int_{{\mathbbm{R}}^{d}}{\mathbb{E}}\!\left[\left|\big{.}(F_{i}(u_{i}))(s,z+\mathbf{W}_{s}-\mathbf{W}_{t})\right|^{q}\Big{.}\!\right]\,\mu_{t}(dz)\right)^{\!\!\nicefrac{{1}}{{q}}}\,ds\\ &=\int_{t}^{T}\left(\bigg{.}\!{\mathbb{E}}\!\left[\left|\big{.}\left(F_{i}(u_{i})\right)(s,x+\mathbf{W}_{t}+\mathbf{W}_{s}-\mathbf{W}_{t})\right|^{q}\Big{.}\!\right]\right)^{\!\!\nicefrac{{1}}{{q}}}\!ds\\ &\leq\int_{t}^{T}\left({\mathbb{E}}\!\left[\Big{.}\!\left|(F_{i}(0))(s,x+\mathbf{W}_{s})\right|^{q}\right]\Bigg{.}\!\right)^{\!\!\nicefrac{{1}}{{q}}}\!ds+\int_{t}^{T}\left({\mathbb{E}}\!\left[\Big{.}\!\left|(F_{i}(u_{i})-F_{i}(0))(s,x+\mathbf{W}_{s})\right|^{q}\right]\Bigg{.}\!\right)^{\!\nicefrac{{1}}{{q}}}\,ds\\ &\leq T\sup_{s\in[0,T]}\left({\mathbb{E}}\!\left[\Big{.}\!B^{q}\left(1+\left\|{x+\mathbf{W}_{s}}\right\|\Big{.}\right)^{pq}\bigg{.}\!\right]\Bigg{.}\!\right)^{\!\nicefrac{{1}}{{q}}}+\int_{t}^{T}\left({\mathbb{E}}\!\left[\Big{.}L^{q}\left|u_{i}(s,x+\mathbf{W}_{s})\right|^{q}\right]\bigg{.}\!\right)^{\!\nicefrac{{1}}{{q}}}\,ds.\end{split}

\displaystyle\begin{split}&\int_{t}^{T}\left(\int_{{\mathbbm{R}}^{d}}\left|{\mathbb{E}}\!\left[\big{.}\!(F_{i}(u_{i}))(s,z+\mathbf{W}_{s-t})\right]\right|^{q}\,\mu_{t}(dz)\right)^{\!\!\nicefrac{{1}}{{q}}}ds\\ &\leq\int_{t}^{T}\left(\int_{{\mathbbm{R}}^{d}}{\mathbb{E}}\!\left[\left|\big{.}(F_{i}(u_{i}))(s,z+\mathbf{W}_{s}-\mathbf{W}_{t})\right|^{q}\Big{.}\!\right]\,\mu_{t}(dz)\right)^{\!\!\nicefrac{{1}}{{q}}}\,ds\\ &=\int_{t}^{T}\left(\bigg{.}\!{\mathbb{E}}\!\left[\left|\big{.}\left(F_{i}(u_{i})\right)(s,x+\mathbf{W}_{t}+\mathbf{W}_{s}-\mathbf{W}_{t})\right|^{q}\Big{.}\!\right]\right)^{\!\!\nicefrac{{1}}{{q}}}\!ds\\ &\leq\int_{t}^{T}\left({\mathbb{E}}\!\left[\Big{.}\!\left|(F_{i}(0))(s,x+\mathbf{W}_{s})\right|^{q}\right]\Bigg{.}\!\right)^{\!\!\nicefrac{{1}}{{q}}}\!ds+\int_{t}^{T}\left({\mathbb{E}}\!\left[\Big{.}\!\left|(F_{i}(u_{i})-F_{i}(0))(s,x+\mathbf{W}_{s})\right|^{q}\right]\Bigg{.}\!\right)^{\!\nicefrac{{1}}{{q}}}\,ds\\ &\leq T\sup_{s\in[0,T]}\left({\mathbb{E}}\!\left[\Big{.}\!B^{q}\left(1+\left\|{x+\mathbf{W}_{s}}\right\|\Big{.}\right)^{pq}\bigg{.}\!\right]\Bigg{.}\!\right)^{\!\nicefrac{{1}}{{q}}}+\int_{t}^{T}\left({\mathbb{E}}\!\left[\Big{.}L^{q}\left|u_{i}(s,x+\mathbf{W}_{s})\right|^{q}\right]\bigg{.}\!\right)^{\!\nicefrac{{1}}{{q}}}\,ds.\end{split}

\begin{split}&\left(\bigg{.}{\mathbb{E}}\!\left[\Big{.}\!\left|\big{.}u_{i}(t,x+\mathbf{W}_{t})\right|^{q}\right]\right)^{\!\!\nicefrac{{1}}{{q}}}\\ &\leq(T+1)B\sup_{s\in[0,T]}\left({\mathbb{E}}\!\left[\Big{.}\!\left(1+\left\|{x+\mathbf{W}_{s}}\right\|\Big{.}\right)^{pq}\bigg{.}\!\right]\Bigg{.}\!\right)^{\!\nicefrac{{1}}{{q}}}+L\int_{t}^{T}\left({\mathbb{E}}\!\left[\Big{.}\left|u_{i}(s,x+\mathbf{W}_{s})\right|^{q}\right]\bigg{.}\!\right)^{\!\nicefrac{{1}}{{q}}}\,ds.\end{split}

\begin{split}&\left(\bigg{.}{\mathbb{E}}\!\left[\Big{.}\!\left|\big{.}u_{i}(t,x+\mathbf{W}_{t})\right|^{q}\right]\right)^{\!\!\nicefrac{{1}}{{q}}}\\ &\leq(T+1)B\sup_{s\in[0,T]}\left({\mathbb{E}}\!\left[\Big{.}\!\left(1+\left\|{x+\mathbf{W}_{s}}\right\|\Big{.}\right)^{pq}\bigg{.}\!\right]\Bigg{.}\!\right)^{\!\nicefrac{{1}}{{q}}}+L\int_{t}^{T}\left({\mathbb{E}}\!\left[\Big{.}\left|u_{i}(s,x+\mathbf{W}_{s})\right|^{q}\right]\bigg{.}\!\right)^{\!\nicefrac{{1}}{{q}}}\,ds.\end{split}

s \in [0, T] sup y \in \mathbbm R^{d} sup \frac{∣ u _{i} ( s , y ) ∣}{( 1 + ∥ y ∥ ) ^{p}} \leq s \in [0, T] sup y \in \mathbbm R^{d} sup \frac{∣ u _{i} ( s , y ) ∣}{1 + ∥ y ∥ ^{p}} < \infty.

s \in [0, T] sup y \in \mathbbm R^{d} sup \frac{∣ u _{i} ( s , y ) ∣}{( 1 + ∥ y ∥ ) ^{p}} \leq s \in [0, T] sup y \in \mathbbm R^{d} sup \frac{∣ u _{i} ( s , y ) ∣}{1 + ∥ y ∥ ^{p}} < \infty.

\displaystyle\begin{split}&\int_{0}^{T}\left({\mathbb{E}}\!\left[\Big{.}\!\left|u_{i}(s,x+\mathbf{W}_{s})\right|^{q}\right]\bigg{.}\!\right)^{\!\!\nicefrac{{1}}{{q}}}ds\leq\left[\sup_{s\in[0,T]}\sup_{y\in{\mathbbm{R}}^{d}}\frac{|u(s,y)|}{\left(1+\left\|{y}\right\|\right)^{p}}\right]\int_{0}^{T}\left({\mathbb{E}}\!\left[\left(1+\left\|{x+\mathbf{W}_{s}}\right\|\Big{.}\!\right)^{pq}\bigg{.}\!\right]\Bigg{.}\!\right)^{\!\!\nicefrac{{1}}{{q}}}ds\\ &\leq\left[\sup_{s\in[0,T]}\sup_{y\in{\mathbbm{R}}^{d}}\frac{|u(s,y)|}{\left(1+\left\|{y}\right\|\right)^{p}}\right]T\left(1+\left\|{x}\right\|+\left({\mathbb{E}}\!\left[\left\|{\mathbf{W}_{T}}\right\|^{pq}\Big{.}\!\right]\bigg{.}\!\right)^{\!\!\frac{1}{pq}}\Bigg{.}\!\right)^{p}<\infty.\end{split}

\displaystyle\begin{split}&\int_{0}^{T}\left({\mathbb{E}}\!\left[\Big{.}\!\left|u_{i}(s,x+\mathbf{W}_{s})\right|^{q}\right]\bigg{.}\!\right)^{\!\!\nicefrac{{1}}{{q}}}ds\leq\left[\sup_{s\in[0,T]}\sup_{y\in{\mathbbm{R}}^{d}}\frac{|u(s,y)|}{\left(1+\left\|{y}\right\|\right)^{p}}\right]\int_{0}^{T}\left({\mathbb{E}}\!\left[\left(1+\left\|{x+\mathbf{W}_{s}}\right\|\Big{.}\!\right)^{pq}\bigg{.}\!\right]\Bigg{.}\!\right)^{\!\!\nicefrac{{1}}{{q}}}ds\\ &\leq\left[\sup_{s\in[0,T]}\sup_{y\in{\mathbbm{R}}^{d}}\frac{|u(s,y)|}{\left(1+\left\|{y}\right\|\right)^{p}}\right]T\left(1+\left\|{x}\right\|+\left({\mathbb{E}}\!\left[\left\|{\mathbf{W}_{T}}\right\|^{pq}\Big{.}\!\right]\bigg{.}\!\right)^{\!\!\frac{1}{pq}}\Bigg{.}\!\right)^{p}<\infty.\end{split}

\begin{split}\left(\bigg{.}{\mathbb{E}}\!\left[\Big{.}\!\left|u_{i}(t,x+\mathbf{W}_{t})\right|^{q}\right]\right)^{\!\!\nicefrac{{1}}{{q}}}\leq e^{LT}(T+1)B\sup_{s\in[0,T]}\left({\mathbb{E}}\!\left[\Big{.}\!\left(1+\left\|{x+\mathbf{W}_{s}}\right\|\Big{.}\right)^{pq}\bigg{.}\!\right]\Bigg{.}\!\right)^{\!\!\nicefrac{{1}}{{q}}}.\end{split}

\begin{split}\left(\bigg{.}{\mathbb{E}}\!\left[\Big{.}\!\left|u_{i}(t,x+\mathbf{W}_{t})\right|^{q}\right]\right)^{\!\!\nicefrac{{1}}{{q}}}\leq e^{LT}(T+1)B\sup_{s\in[0,T]}\left({\mathbb{E}}\!\left[\Big{.}\!\left(1+\left\|{x+\mathbf{W}_{s}}\right\|\Big{.}\right)^{pq}\bigg{.}\!\right]\Bigg{.}\!\right)^{\!\!\nicefrac{{1}}{{q}}}.\end{split}

\displaystyle\begin{split}&{\mathbb{E}}\!\left[\Big{.}\!\left|u_{1}(t,x+\mathbf{W}_{t})-u_{2}(t,x+\mathbf{W}_{t})\right|\right]\\ &\leq\delta\left(e^{LT}(T+1)\right)^{q+1}\left(B^{q}+1\right)\left(1+\left\|{x}\right\|+\left({\mathbb{E}}\!\left[\left\|{\mathbf{W}_{T}}\right\|^{pq}\Big{.}\!\right]\bigg{.}\!\right)^{\!\!\frac{1}{pq}}\right)^{pq}.\end{split}

\displaystyle\begin{split}&{\mathbb{E}}\!\left[\Big{.}\!\left|u_{1}(t,x+\mathbf{W}_{t})-u_{2}(t,x+\mathbf{W}_{t})\right|\right]\\ &\leq\delta\left(e^{LT}(T+1)\right)^{q+1}\left(B^{q}+1\right)\left(1+\left\|{x}\right\|+\left({\mathbb{E}}\!\left[\left\|{\mathbf{W}_{T}}\right\|^{pq}\Big{.}\!\right]\bigg{.}\!\right)^{\!\!\frac{1}{pq}}\right)^{pq}.\end{split}

\displaystyle\begin{split}&\left|u_{1}(s,z)-u_{2}(s,z)\right|\\ &=\left|{\mathbb{E}}\!\left[(g_{1}-g_{2})\left(z+\mathbf{W}_{T-s}\right)+\int_{s}^{T}\big{(}F_{1}(u_{1})-F_{1}(u_{2})+F_{1}(u_{2})-F_{2}(u_{2})\big{)}\left(t,z+\mathbf{W}_{t-s}\right)\,dt\right]\right|\\ &\leq{\mathbb{E}}\!\left[\Big{.}\!\left|\big{.}(g_{1}-g_{2})\left(z+\mathbf{W}_{T}-\mathbf{W}_{s}\right)\right|\right]+\int_{s}^{T}{\mathbb{E}}\!\left[\left|\big{(}F_{1}(u_{1})-F_{1}(u_{2})\big{)}\left(t,z+\mathbf{W}_{t}-\mathbf{W}_{s}\right)\right|\Big{.}\!\right]\,dt\\ &\qquad\qquad\qquad+\int_{s}^{T}{\mathbb{E}}\!\left[\left|\big{(}F_{1}(u_{2})-F_{2}(u_{2})\big{)}\left(t,z+\mathbf{W}_{t}-\mathbf{W}_{s}\right)\right|\bigg{.}\!\right]\,dt.\end{split}

\displaystyle\begin{split}&\left|u_{1}(s,z)-u_{2}(s,z)\right|\\ &=\left|{\mathbb{E}}\!\left[(g_{1}-g_{2})\left(z+\mathbf{W}_{T-s}\right)+\int_{s}^{T}\big{(}F_{1}(u_{1})-F_{1}(u_{2})+F_{1}(u_{2})-F_{2}(u_{2})\big{)}\left(t,z+\mathbf{W}_{t-s}\right)\,dt\right]\right|\\ &\leq{\mathbb{E}}\!\left[\Big{.}\!\left|\big{.}(g_{1}-g_{2})\left(z+\mathbf{W}_{T}-\mathbf{W}_{s}\right)\right|\right]+\int_{s}^{T}{\mathbb{E}}\!\left[\left|\big{(}F_{1}(u_{1})-F_{1}(u_{2})\big{)}\left(t,z+\mathbf{W}_{t}-\mathbf{W}_{s}\right)\right|\Big{.}\!\right]\,dt\\ &\qquad\qquad\qquad+\int_{s}^{T}{\mathbb{E}}\!\left[\left|\big{(}F_{1}(u_{2})-F_{2}(u_{2})\big{)}\left(t,z+\mathbf{W}_{t}-\mathbf{W}_{s}\right)\right|\bigg{.}\!\right]\,dt.\end{split}

\displaystyle\begin{split}&{\mathbb{E}}\!\left[\Big{.}\!\left|\big{.}\!\left(u_{1}-u_{2}\right)(s,x+\mathbf{W}_{s})\right|\right]={\mathbb{E}}\!\left[\Big{.}\!\left.\left|\big{.}u_{1}(s,z)-u_{2}(s,z)\right|\right|_{z=x+\mathbf{W}_{s}}\right]\\ &\leq{\mathbb{E}}\!\left[{\mathbb{E}}\!\left[\left.\Big{.}\!\left|\big{.}(g_{1}-g_{2})\left(z+\mathbf{W}_{T}-\mathbf{W}_{s}\right)\right|\right]\right|_{z=x+\mathbf{W}_{s}}\right]\\ &\qquad\qquad+\int_{s}^{T}{\mathbb{E}}\!\left[{\mathbb{E}}\!\left[\left.\left|\big{(}F_{1}(u_{1})-F_{1}(u_{2})\big{)}\left(t,z+\mathbf{W}_{t}-\mathbf{W}_{s}\right)\right|\Big{.}\!\right]\right|_{z=x+\mathbf{W}_{s}}\right]\,dt\\ &\qquad\qquad+\int_{s}^{T}{\mathbb{E}}\!\left[{\mathbb{E}}\!\left[\left.\left|\big{(}F_{1}(u_{2})-F_{2}(u_{2})\big{)}\left(t,z+\mathbf{W}_{t}-\mathbf{W}_{s}\right)\right|\bigg{.}\!\right]\right|_{z=x+\mathbf{W}_{s}}\right]\,dt\\ &={\mathbb{E}}\!\left[\Big{.}\!\left|\big{.}(g_{1}-g_{2})\left(x+\mathbf{W}_{T}\right)\right|\right]+\int_{s}^{T}{\mathbb{E}}\!\left[\left|\big{(}F_{1}(u_{1})-F_{1}(u_{2})\big{)}\left(t,x+\mathbf{W}_{t}\right)\right|\Big{.}\!\right]\,dt\\ &\qquad\qquad+\int_{s}^{T}{\mathbb{E}}\!\left[\left|\big{(}F_{1}(u_{2})-F_{2}(u_{2})\big{)}\left(t,x+\mathbf{W}_{t}\right)\right|\bigg{.}\!\right]\,dt\\ &\leq{\mathbb{E}}\!\left[\Big{.}\!\left|(g_{1}-g_{2})\left(x+\mathbf{W}_{T}\right)\big{.}\!\right|\right]+\int_{s}^{T}{\mathbb{E}}\!\left[L\left|\big{(}u_{1}-u_{2}\big{)}\left(t,x+\mathbf{W}_{t}\right)\right|\Big{.}\!\right]\,dt\\ &\qquad\qquad+T\sup_{t\in[0,T]}{\mathbb{E}}\!\left[\left|\big{(}F_{1}(u_{2})-F_{2}(u_{2})\big{)}\left(t,x+\mathbf{W}_{t}\right)\right|\Big{.}\!\right].\end{split}

\displaystyle\begin{split}&{\mathbb{E}}\!\left[\Big{.}\!\left|\big{.}\!\left(u_{1}-u_{2}\right)(s,x+\mathbf{W}_{s})\right|\right]={\mathbb{E}}\!\left[\Big{.}\!\left.\left|\big{.}u_{1}(s,z)-u_{2}(s,z)\right|\right|_{z=x+\mathbf{W}_{s}}\right]\\ &\leq{\mathbb{E}}\!\left[{\mathbb{E}}\!\left[\left.\Big{.}\!\left|\big{.}(g_{1}-g_{2})\left(z+\mathbf{W}_{T}-\mathbf{W}_{s}\right)\right|\right]\right|_{z=x+\mathbf{W}_{s}}\right]\\ &\qquad\qquad+\int_{s}^{T}{\mathbb{E}}\!\left[{\mathbb{E}}\!\left[\left.\left|\big{(}F_{1}(u_{1})-F_{1}(u_{2})\big{)}\left(t,z+\mathbf{W}_{t}-\mathbf{W}_{s}\right)\right|\Big{.}\!\right]\right|_{z=x+\mathbf{W}_{s}}\right]\,dt\\ &\qquad\qquad+\int_{s}^{T}{\mathbb{E}}\!\left[{\mathbb{E}}\!\left[\left.\left|\big{(}F_{1}(u_{2})-F_{2}(u_{2})\big{)}\left(t,z+\mathbf{W}_{t}-\mathbf{W}_{s}\right)\right|\bigg{.}\!\right]\right|_{z=x+\mathbf{W}_{s}}\right]\,dt\\ &={\mathbb{E}}\!\left[\Big{.}\!\left|\big{.}(g_{1}-g_{2})\left(x+\mathbf{W}_{T}\right)\right|\right]+\int_{s}^{T}{\mathbb{E}}\!\left[\left|\big{(}F_{1}(u_{1})-F_{1}(u_{2})\big{)}\left(t,x+\mathbf{W}_{t}\right)\right|\Big{.}\!\right]\,dt\\ &\qquad\qquad+\int_{s}^{T}{\mathbb{E}}\!\left[\left|\big{(}F_{1}(u_{2})-F_{2}(u_{2})\big{)}\left(t,x+\mathbf{W}_{t}\right)\right|\bigg{.}\!\right]\,dt\\ &\leq{\mathbb{E}}\!\left[\Big{.}\!\left|(g_{1}-g_{2})\left(x+\mathbf{W}_{T}\right)\big{.}\!\right|\right]+\int_{s}^{T}{\mathbb{E}}\!\left[L\left|\big{(}u_{1}-u_{2}\big{)}\left(t,x+\mathbf{W}_{t}\right)\right|\Big{.}\!\right]\,dt\\ &\qquad\qquad+T\sup_{t\in[0,T]}{\mathbb{E}}\!\left[\left|\big{(}F_{1}(u_{2})-F_{2}(u_{2})\big{)}\left(t,x+\mathbf{W}_{t}\right)\right|\Big{.}\!\right].\end{split}

\displaystyle\small\begin{split}&\sup_{t\in[0,T]}{\mathbb{E}}\!\left[\Big{.}\!\left|\big{.}\!\left(u_{1}-u_{2}\right)(t,x+\mathbf{W}_{t})\right|\right]\\ &\leq e^{LT}(T+1)\sup_{t\in[0,T]}\max\left\{{\mathbb{E}}\!\left[\Big{.}\!\left|\big{.}(g_{1}-g_{2})\left(x+\mathbf{W}_{T}\right)\right|\right],{\mathbb{E}}\!\left[\left|\big{(}F_{1}(u_{2})-F_{2}(u_{2})\big{)}\left(t,x+\mathbf{W}_{t}\right)\right|\Big{.}\!\right]\right\}.\end{split}

\displaystyle\small\begin{split}&\sup_{t\in[0,T]}{\mathbb{E}}\!\left[\Big{.}\!\left|\big{.}\!\left(u_{1}-u_{2}\right)(t,x+\mathbf{W}_{t})\right|\right]\\ &\leq e^{LT}(T+1)\sup_{t\in[0,T]}\max\left\{{\mathbb{E}}\!\left[\Big{.}\!\left|\big{.}(g_{1}-g_{2})\left(x+\mathbf{W}_{T}\right)\right|\right],{\mathbb{E}}\!\left[\left|\big{(}F_{1}(u_{2})-F_{2}(u_{2})\big{)}\left(t,x+\mathbf{W}_{t}\right)\right|\Big{.}\!\right]\right\}.\end{split}

\displaystyle\begin{split}&\sup_{t\in[0,T]}\max\left\{{\mathbb{E}}\!\left[\Big{.}\!\left|\big{.}(g_{1}-g_{2})\left(x+\mathbf{W}_{T}\right)\right|\right],{\mathbb{E}}\!\left[\left|\big{(}F_{1}(u_{2})-F_{2}(u_{2})\big{)}\left(t,x+\mathbf{W}_{t}\right)\right|\Big{.}\!\right]\right\}\\ &\leq\delta\sup_{t\in[0,T]}{\mathbb{E}}\!\left[\left(\Big{.}1+\left\|{x+\mathbf{W}_{t}}\right\|\right)^{pq}+\left|u_{2}(x+\mathbf{W}_{t})\right|^{q}\Big{.}\!\right]\\ &\leq\delta\sup_{t\in[0,T]}{\mathbb{E}}\!\left[\bigg{.}\!\left(\Big{.}1+\left\|{x+\mathbf{W}_{t}}\right\|\right)^{pq}\right]+\delta\sup_{t\in[0,T]}{\mathbb{E}}\!\left[\big{.}\!\left|u_{2}(x+\mathbf{W}_{t})\right|^{q}\Big{.}\right].\\ &\leq\delta\sup_{t\in[0,T]}{\mathbb{E}}\!\left[\bigg{.}\!\left(\Big{.}1+\left\|{x+\mathbf{W}_{t}}\right\|\right)^{pq}\right]+\delta(e^{LT}(T+1)B)^{q}\sup_{t\in[0,T]}{\mathbb{E}}\!\left[\Big{.}\!\left(1+\left\|{x+\mathbf{W}_{t}}\right\|\Big{.}\!\right)^{pq}\bigg{.}\!\right]\\ &\leq\delta\left(e^{LT}(T+1)\right)^{q}(B^{q}+1)\sup_{t\in[0,T]}{\mathbb{E}}\!\left[\Big{.}\!\left(1+\left\|{x+\mathbf{W}_{t}}\right\|\Big{.}\!\right)^{pq}\bigg{.}\!\right].\end{split}

\displaystyle\begin{split}&\sup_{t\in[0,T]}\max\left\{{\mathbb{E}}\!\left[\Big{.}\!\left|\big{.}(g_{1}-g_{2})\left(x+\mathbf{W}_{T}\right)\right|\right],{\mathbb{E}}\!\left[\left|\big{(}F_{1}(u_{2})-F_{2}(u_{2})\big{)}\left(t,x+\mathbf{W}_{t}\right)\right|\Big{.}\!\right]\right\}\\ &\leq\delta\sup_{t\in[0,T]}{\mathbb{E}}\!\left[\left(\Big{.}1+\left\|{x+\mathbf{W}_{t}}\right\|\right)^{pq}+\left|u_{2}(x+\mathbf{W}_{t})\right|^{q}\Big{.}\!\right]\\ &\leq\delta\sup_{t\in[0,T]}{\mathbb{E}}\!\left[\bigg{.}\!\left(\Big{.}1+\left\|{x+\mathbf{W}_{t}}\right\|\right)^{pq}\right]+\delta\sup_{t\in[0,T]}{\mathbb{E}}\!\left[\big{.}\!\left|u_{2}(x+\mathbf{W}_{t})\right|^{q}\Big{.}\right].\\ &\leq\delta\sup_{t\in[0,T]}{\mathbb{E}}\!\left[\bigg{.}\!\left(\Big{.}1+\left\|{x+\mathbf{W}_{t}}\right\|\right)^{pq}\right]+\delta(e^{LT}(T+1)B)^{q}\sup_{t\in[0,T]}{\mathbb{E}}\!\left[\Big{.}\!\left(1+\left\|{x+\mathbf{W}_{t}}\right\|\Big{.}\!\right)^{pq}\bigg{.}\!\right]\\ &\leq\delta\left(e^{LT}(T+1)\right)^{q}(B^{q}+1)\sup_{t\in[0,T]}{\mathbb{E}}\!\left[\Big{.}\!\left(1+\left\|{x+\mathbf{W}_{t}}\right\|\Big{.}\!\right)^{pq}\bigg{.}\!\right].\end{split}

\displaystyle\begin{split}&\sup_{t\in[0,T]}{\mathbb{E}}\!\left[\Big{.}\!\left|\big{.}\!\left(u_{1}-u_{2}\right)(t,x+\mathbf{W}_{t})\right|\right]\\ &\leq\delta\left(e^{LT}(T+1)\right)^{q+1}\left(B^{q}+1\right)\sup_{t\in[0,T]}{\mathbb{E}}\!\left[\Big{.}\!\left(1+\left\|{x+\mathbf{W}_{t}}\right\|\Big{.}\!\right)^{pq}\bigg{.}\!\right]\\ &\leq\delta\left(e^{LT}(T+1)\right)^{q+1}\left(B^{q}+1\right)\left(1+\left\|{x}\right\|+\left({\mathbb{E}}\!\left[\left\|{\mathbf{W}_{T}}\right\|^{pq}\Big{.}\!\right]\bigg{.}\!\right)^{\!\!\frac{1}{pq}}\right)^{pq}.\end{split}

\displaystyle\begin{split}&\sup_{t\in[0,T]}{\mathbb{E}}\!\left[\Big{.}\!\left|\big{.}\!\left(u_{1}-u_{2}\right)(t,x+\mathbf{W}_{t})\right|\right]\\ &\leq\delta\left(e^{LT}(T+1)\right)^{q+1}\left(B^{q}+1\right)\sup_{t\in[0,T]}{\mathbb{E}}\!\left[\Big{.}\!\left(1+\left\|{x+\mathbf{W}_{t}}\right\|\Big{.}\!\right)^{pq}\bigg{.}\!\right]\\ &\leq\delta\left(e^{LT}(T+1)\right)^{q+1}\left(B^{q}+1\right)\left(1+\left\|{x}\right\|+\left({\mathbb{E}}\!\left[\left\|{\mathbf{W}_{T}}\right\|^{pq}\Big{.}\!\right]\bigg{.}\!\right)^{\!\!\frac{1}{pq}}\right)^{pq}.\end{split}

\displaystyle\begin{split}&\left(\bigg{.}\!{\mathbb{E}}\!\left[\left|U^{0}_{N,M}(0,x)-u_{1}(0,x)\right|^{2}\right]\right)^{\!\!\nicefrac{{1}}{{2}}}\\ &\leq\left(e^{LT}(T+1)\right)^{q+1}\left(B^{q}+1\right)\left(\delta+\frac{e^{M/2}(1+2LT)^{N}}{M^{N/2}}\right)\left(1+\left\|{x}\right\|+\left({\mathbb{E}}\!\left[\left\|{\mathbf{W}_{T}}\right\|^{pq}\Big{.}\!\right]\bigg{.}\!\right)^{\!\!\frac{1}{pq}}\right)^{pq}.\end{split}

\displaystyle\begin{split}&\left(\bigg{.}\!{\mathbb{E}}\!\left[\left|U^{0}_{N,M}(0,x)-u_{1}(0,x)\right|^{2}\right]\right)^{\!\!\nicefrac{{1}}{{2}}}\\ &\leq\left(e^{LT}(T+1)\right)^{q+1}\left(B^{q}+1\right)\left(\delta+\frac{e^{M/2}(1+2LT)^{N}}{M^{N/2}}\right)\left(1+\left\|{x}\right\|+\left({\mathbb{E}}\!\left[\left\|{\mathbf{W}_{T}}\right\|^{pq}\Big{.}\!\right]\bigg{.}\!\right)^{\!\!\frac{1}{pq}}\right)^{pq}.\end{split}

\displaystyle\begin{split}&\left(\bigg{.}\!{\mathbb{E}}\!\left[\left|U^{0}_{N,M}(0,x)-u_{2}(0,x)\right|^{2}\right]\right)^{\!\!\nicefrac{{1}}{{2}}}\\ &\leq e^{LT}\left[\left({\mathbb{E}}\!\left[\Big{.}\!\left|\big{.}g_{2}(x+\mathbf{W}_{T})\right|^{2}\right]\bigg{.}\!\right)^{\!\!\nicefrac{{1}}{{2}}}+T\left(\frac{1}{T}\int_{0}^{T}{\mathbb{E}}\!\left[\Big{.}\!\left|(F_{2}(0))(t,x+\mathbf{W}_{t})\right|^{2}\right]dt\Bigg{.}\!\right)^{\!\!\nicefrac{{1}}{{2}}}\right]\frac{e^{M/2}(1+2LT)^{N}}{M^{N/2}}\\ &\leq e^{LT}(T+1)\sup_{t\in[0,T]}\left({\mathbb{E}}\!\left[B^{2}\left(1+\left\|{x+\mathbf{W}_{t}}\right\|\Big{.}\!\right)^{2p}\right]\Bigg{.}\!\right)^{\!\!\nicefrac{{1}}{{2}}}\frac{e^{M/2}(1+2LT)^{N}}{M^{N/2}}\\ &\leq e^{LT}(T+1)B\left(1+\left\|{x}\right\|+\left({\mathbb{E}}\!\left[\left\|{\mathbf{W}_{T}}\right\|^{2p}\Big{.}\!\right]\right)^{\!\!\frac{1}{2p}}\Bigg{.}\!\right)^{p}\frac{e^{M/2}(1+2LT)^{N}}{M^{N/2}}.\end{split}

\displaystyle\begin{split}&\left(\bigg{.}\!{\mathbb{E}}\!\left[\left|U^{0}_{N,M}(0,x)-u_{2}(0,x)\right|^{2}\right]\right)^{\!\!\nicefrac{{1}}{{2}}}\\ &\leq e^{LT}\left[\left({\mathbb{E}}\!\left[\Big{.}\!\left|\big{.}g_{2}(x+\mathbf{W}_{T})\right|^{2}\right]\bigg{.}\!\right)^{\!\!\nicefrac{{1}}{{2}}}+T\left(\frac{1}{T}\int_{0}^{T}{\mathbb{E}}\!\left[\Big{.}\!\left|(F_{2}(0))(t,x+\mathbf{W}_{t})\right|^{2}\right]dt\Bigg{.}\!\right)^{\!\!\nicefrac{{1}}{{2}}}\right]\frac{e^{M/2}(1+2LT)^{N}}{M^{N/2}}\\ &\leq e^{LT}(T+1)\sup_{t\in[0,T]}\left({\mathbb{E}}\!\left[B^{2}\left(1+\left\|{x+\mathbf{W}_{t}}\right\|\Big{.}\!\right)^{2p}\right]\Bigg{.}\!\right)^{\!\!\nicefrac{{1}}{{2}}}\frac{e^{M/2}(1+2LT)^{N}}{M^{N/2}}\\ &\leq e^{LT}(T+1)B\left(1+\left\|{x}\right\|+\left({\mathbb{E}}\!\left[\left\|{\mathbf{W}_{T}}\right\|^{2p}\Big{.}\!\right]\right)^{\!\!\frac{1}{2p}}\Bigg{.}\!\right)^{p}\frac{e^{M/2}(1+2LT)^{N}}{M^{N/2}}.\end{split}

\displaystyle\begin{split}\left|u_{2}(0,x)-u_{1}(0,x)\right|&\leq\delta\left(e^{LT}(T+1)\right)^{q+1}\left(B^{q}+1\right)\left(1+\left\|{x}\right\|+\left({\mathbb{E}}\!\left[\left\|{\mathbf{W}_{T}}\right\|^{pq}\Big{.}\!\right]\bigg{.}\!\right)^{\!\!\frac{1}{pq}}\right)^{pq}.\end{split}

\displaystyle\begin{split}\left|u_{2}(0,x)-u_{1}(0,x)\right|&\leq\delta\left(e^{LT}(T+1)\right)^{q+1}\left(B^{q}+1\right)\left(1+\left\|{x}\right\|+\left({\mathbb{E}}\!\left[\left\|{\mathbf{W}_{T}}\right\|^{pq}\Big{.}\!\right]\bigg{.}\!\right)^{\!\!\frac{1}{pq}}\right)^{pq}.\end{split}

\displaystyle\begin{split}&\left(\bigg{.}\!{\mathbb{E}}\!\left[\left|U^{0}_{N,M}(0,x)-u_{1}(0,x)\right|^{2}\right]\right)^{\!\!\nicefrac{{1}}{{2}}}\\ &\leq\left(\bigg{.}\!{\mathbb{E}}\!\left[\left|U^{0}_{N,M}(0,x)-u_{2}(0,x)\right|^{2}\right]\right)^{\!\!\nicefrac{{1}}{{2}}}+\left|\big{.}u_{2}(0,x)-u_{1}(0,x)\right|\\ &\leq\left(e^{LT}(T+1)\right)^{q+1}\left(B^{q}+1\right)\left(\delta+\frac{e^{M/2}(1+2LT)^{N}}{M^{N/2}}\right)\!\!\left(1+\left\|{x}\right\|+\left({\mathbb{E}}\!\left[\left\|{\mathbf{W}_{T}}\right\|^{pq}\Big{.}\!\right]\bigg{.}\!\right)^{\!\!\frac{1}{pq}}\right)^{pq}.\end{split}

\displaystyle\begin{split}&\left(\bigg{.}\!{\mathbb{E}}\!\left[\left|U^{0}_{N,M}(0,x)-u_{1}(0,x)\right|^{2}\right]\right)^{\!\!\nicefrac{{1}}{{2}}}\\ &\leq\left(\bigg{.}\!{\mathbb{E}}\!\left[\left|U^{0}_{N,M}(0,x)-u_{2}(0,x)\right|^{2}\right]\right)^{\!\!\nicefrac{{1}}{{2}}}+\left|\big{.}u_{2}(0,x)-u_{1}(0,x)\right|\\ &\leq\left(e^{LT}(T+1)\right)^{q+1}\left(B^{q}+1\right)\left(\delta+\frac{e^{M/2}(1+2LT)^{N}}{M^{N/2}}\right)\!\!\left(1+\left\|{x}\right\|+\left({\mathbb{E}}\!\left[\left\|{\mathbf{W}_{T}}\right\|^{pq}\Big{.}\!\right]\bigg{.}\!\right)^{\!\!\frac{1}{pq}}\right)^{pq}.\end{split}

A_{d} (x) = (max {x_{1}, 0}, \dots, max {x_{d}, 0}),

A_{d} (x) = (max {x_{1}, 0}, \dots, max {x_{d}, 0}),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

A proof that rectified deep neural networks

overcome the curse of dimensionality in the numerical

approximation of semilinear heat equations

Martin Hutzenthaler1, Arnulf Jentzen2, Thomas Kruse3, & Tuan Anh Nguyen4

1 Faculty of Mathematics, University of Duisburg-Essen,

45117 Essen, Germany, e-mail: [email protected]

2 SAM, Department of Mathematics, ETH Zurich,

8092 Zurich, Switzerland, e-mail: [email protected]

3 Institute of Mathematics, University of Gießen,

35392 Gießen, Germany, e-mail: [email protected]

4 Faculty of Mathematics, University of Duisburg-Essen,

45117 Essen, Germany, e-mail: [email protected]

Abstract

Deep neural networks and other deep learning methods have very successfully been applied to the numerical approximation of high-dimensional nonlinear parabolic partial differential equations (PDEs), which are widely used in finance, engineering, and natural sciences. In particular, simulations indicate that algorithms based on deep learning overcome the curse of dimensionality in the numerical approximation of solutions of semilinear PDEs. For certain linear PDEs this has also been proved mathematically. The key contribution of this article is to rigorously prove this for the first time for a class of nonlinear PDEs. More precisely, we prove in the case of semilinear heat equations with gradient-independent nonlinearities that the numbers of parameters of the employed deep neural networks grow at most polynomially in both the PDE dimension and the reciprocal of the prescribed approximation accuracy. Our proof relies on recently introduced full history recursive multilevel Picard approximations of semilinear PDEs.

00footnotetext: AMS 2010 subject classification: 65C99; 68T0500footnotetext: Key words and phrases: curse of dimensionality, high-dimensional PDEs, deep neural networks, information based complexity, tractability of multivariate problems, multilevel Picard approximations

1 Introduction
2 A stability result for multilevel Picard (MLP) approximations
2.1 Setting
2.2 An a priori estimate for solutions of partial differential equations (PDEs)
2.3 A stability result for solutions of PDEs
2.4 A stability result for MLP approximations
3 Deep neural network representations for MLP approximations
3.1 A mathematical framework for deep neural networks
3.2 Properties of operations associated to deep neural networks
3.3 Deep neural network representations for MLP approximations

1 Introduction

Deep neural networks (DNNs) have revolutionized a number of computational problems; see, e.g., the references in Grohs et al. [GHJvW18]. In 2017 deep learning-based approximation algorithms for certain parabolic partial differential equations (PDEs) have been proposed in Han et al. [EHJ17, HJE18] and based on these works there is now a series of deep learning-based numerical approximation algorithms for a large class of different kinds of PDEs in the scientific literature; see, e.g., [BBG+18, BEJ17, BCJ18, EY18, EGJS18, FTT17, GHJvW18, Hen17, KLY17, Mis18, NM18, Rai18, SS17]. There is empirical evidence that deep learning-based methods work exceptionally well for approximating solutions of high-dimensional PDEs and that these do not suffer from the curse of dimensionality; see, e.g., the simulations in [EHJ17, HJE18, BEJ17, BBG+18]. There exist, however, only few theoretical results which prove that DNN approximations of solutions of PDEs do not suffer from the curse of dimensionality: The recent articles [GHJvW18, BGJ18, JSW18, EGJS18] prove rigorously that DNN approximations overcome the curse of dimensionality in the numerical approximation of solutions of certain linear PDEs.

The main result of this article, Theorem LABEL:n18 below, proves for semilinear heat equations with gradient-independent nonlinearities that the number of parameters of the approximating DNN grows at most polynomially in both the PDE dimension $d\in{\mathbbm{N}}$ and the reciprocal of the prescribed accuracy $\varepsilon>0$ . Thereby, we establish for the first time that there exist DNN approximations of solutions of such PDEs which indeed overcome the curse of dimensionality. To illustrate the main result of this article we formulate in the following result, Theorem 1.1 below, a special case of Theorem LABEL:n18.

Theorem 1.1.

Let $\mathbf{A}_{d}\colon{\mathbbm{R}}^{d}\to{\mathbbm{R}}^{d}$ , $d\in{\mathbbm{N}}=\{1,2,\ldots\}$ , and $\left\|\cdot\right\|\colon(\cup_{d\in{\mathbbm{N}}}{\mathbbm{R}}^{d})\to[0,\infty)$ satisfy for all $d\in{\mathbbm{N}}$ , $x=(x_{1},\ldots,x_{d})\in{\mathbbm{R}}^{d}$ that $\mathbf{A}_{d}(x)=\left(\max\{x_{1},0\},\ldots,\max\{x_{d},0\}\right)$ and $\|x\|=[\sum_{i=1}^{d}(x_{i})^{2}]^{1/2}$ , let $\mathbf{N}=\cup_{H\in{\mathbbm{N}}}\cup_{(k_{0},k_{1},\ldots,k_{H+1})\in{\mathbbm{N}}^{H+2}}[\prod_{n=1}^{H+1}\left({\mathbbm{R}}^{k_{n}\times k_{n-1}}\times{\mathbbm{R}}^{k_{n}}\right)],$ let $\mathcal{R}\colon\mathbf{N}\to(\cup_{k,l\in{\mathbbm{N}}}C({\mathbbm{R}}^{k},{\mathbbm{R}}^{l}))$ and $\mathcal{P}\colon\mathbf{N}\to{\mathbbm{N}}$ satisfy for all $H\in{\mathbbm{N}}$ , $k_{0},k_{1},\ldots,k_{H},k_{H+1}\in{\mathbbm{N}}$ , $\Phi=((W_{1},B_{1}),\ldots,(W_{H+1},B_{H+1}))\in\prod_{n=1}^{H+1}\left({\mathbbm{R}}^{k_{n}\times k_{n-1}}\times{\mathbbm{R}}^{k_{n}}\right),$ $x_{0}\in{\mathbbm{R}}^{k_{0}},\ldots,x_{H}\in{\mathbbm{R}}^{k_{H}}$ with $\forall\,n\in{\mathbbm{N}}\cap[1,H]\colon x_{n}=\mathbf{A}_{k_{n}}(W_{n}x_{n-1}+B_{n})$ that

[TABLE]

let $T,\kappa\in(0,\infty)$ , $f\in C({\mathbbm{R}},{\mathbbm{R}})$ , $(\mathfrak{g}_{d,\varepsilon})_{d\in{\mathbbm{N}},\varepsilon\in(0,1]}\subseteq\mathbf{N}$ , $(c_{d})_{d\in{\mathbbm{N}}}\subseteq(0,\infty)$ , for every $d\in{\mathbbm{N}}$ let $g_{d}\in C({\mathbbm{R}}^{d},{\mathbbm{R}})$ , for every $d\in{\mathbbm{N}}$ let $u_{d}\in C^{1,2}([0,T]\times{\mathbbm{R}}^{d},{\mathbbm{R}})$ , and assume for all $d\in{\mathbbm{N}}$ , $v,w\in{\mathbbm{R}}$ , $x\in{\mathbbm{R}}^{d}$ , $\varepsilon\in(0,1]$ , $t\in(0,T)$ that $|f(v)-f(w)|\leq\kappa|v-w|$ , $\mathcal{R}(\mathfrak{g}_{d,\varepsilon})\in C({\mathbbm{R}}^{d},{\mathbbm{R}})$ , $|(\mathcal{R}(\mathfrak{g}_{d,\varepsilon}))(x)|\leq\kappa d^{\kappa}(1+\left\|{x}\right\|^{\kappa})$ , $\left|g_{d}(x)-(\mathcal{R}(\mathfrak{g}_{d,\varepsilon}))(x)\right|\leq\varepsilon\kappa d^{\kappa}(1+\left\|{x}\right\|^{\kappa})$ , $\mathcal{P}(\mathfrak{g}_{d,\varepsilon})\leq\kappa d^{\kappa}\varepsilon^{-\kappa}$ , $|u_{d}(t,x)|\leq c_{d}(1+\left\|{x}\right\|^{c_{d}})$ , $u_{d}(0,x)=g_{d}(x)$ , and

[TABLE]

Then there exist $(\Psi_{d,\varepsilon})_{d\in{\mathbbm{N}},\varepsilon\in(0,1]}\subseteq\mathbf{N}$ , $\eta\in(0,\infty)$ such that for all $d\in{\mathbbm{N}}$ , $\varepsilon\in(0,1]$ it holds that $\mathcal{R}(\Psi_{d,\varepsilon})\in C({\mathbbm{R}}^{d},{\mathbbm{R}})$ , $\mathcal{P}(\Psi_{d,\varepsilon})\leq\eta d^{\eta}\varepsilon^{-\eta}$ , and

[TABLE]

Theorem 1.1 is an immediate consequence of LABEL:cor:main_thm in LABEL:subsec:dnn_approx_gen_pol below (with $T=2T$ , $u_{d}(t,x)=u_{d}(T-\frac{t}{2},x)$ , $f(v)=f(v)/2$ for $t\in[0,2T]$ , $x\in{\mathbbm{R}}^{d}$ , $v\in{\mathbbm{R}}$ in the notation of LABEL:cor:main_thm). In the manner of the proof of Theorem 3.14 in [GHJvW18] and the proof of Theorem 6.3 in [JSW18], the proof of LABEL:n18 below uses probabilistic arguments on a suitable artificial probability space. Moreover, the proof of LABEL:n18 relies on recently introduced full history recursive multilevel Picard (MLP) approximations which have been proved to overcome the curse of dimensionality in the numerical approximation of solutions of semilinear heat equations at single space-time points; see [EHJK16, EHJK17, HK17, HJK+18]. A key step in our proof is that realizations of certain MLP approximations can be represented by DNNs; see Lemma 3.10 below.

The remainder of this article is organized as follows. In Section 2 we provide auxiliary results on multilevel Picard approximations ensuring that these approximations are stable against perturbations in the nonlinearity $f$ and the terminal condition $g$ of the PDE 1. In Section 3 we show that multilevel Picard approximations can be represented by DNNs and we provide bounds for the number of parameters of the representing DNN. We use the results of Section 2 and Section 3 to prove the main result LABEL:n18 in LABEL:sec:main_result.

2 A stability result for full history recursive multilevel Picard (MLP) approximations

2.1 Setting

Setting 2.1.

Let $d\in{\mathbbm{N}}$ , $T,L,\delta,B\in(0,\infty)$ , $p,q\in[1,\infty)$ , $f_{1},f_{2}\in C\left([0,T]\times{\mathbbm{R}}^{d}\times{\mathbbm{R}},{\mathbbm{R}}\right)$ , $g_{1},g_{2}\in C({\mathbbm{R}}^{d},{\mathbbm{R}})$ , let $\|\cdot\|\colon{\mathbbm{R}}^{d}\to[0,\infty)$ satisfy for all $x=(x_{1},\ldots,x_{d})\in{\mathbbm{R}}^{d}$ that $\|x\|=[\sum_{i=1}^{d}(x_{i})^{2}]^{1/2}$ , assume for all $t\in[0,T]$ , $x\in{\mathbbm{R}}^{d}$ , $w,v\in{\mathbbm{R}}$ , $i\in\{1,2\}$ that

[TABLE]

and

[TABLE]

let $F_{i}\colon C\left([0,T]\times{\mathbbm{R}}^{d},{\mathbbm{R}}\right)\to C\left([0,T]\times{\mathbbm{R}}^{d},{\mathbbm{R}}\right)$ , $i\in\{1,2\}$ , satisfy for all $v\in C\left([0,T]\times{\mathbbm{R}}^{d},{\mathbbm{R}}\right)$ , $t\in[0,T]$ , $x\in{\mathbbm{R}}^{d}$ , $i\in\{1,2\}$ that

[TABLE]

let $(\Omega,\mathcal{F},{\mathbb{P}})$ be a probability space, let $\mathbf{W}\colon[0,T]\times\Omega\to{\mathbbm{R}}^{d}$ be a standard Brownian motion with continuous sample paths, let $u_{1},u_{2}\in C([0,T]\times{\mathbbm{R}}^{d},{\mathbbm{R}})$ , assume for all $i\in\{1,2\}$ , $s\in[0,T]$ , $x\in{\mathbbm{R}}^{d}$ that

[TABLE]

and

[TABLE]

let $\Theta=\bigcup_{n\in{\mathbbm{N}}}{\mathbbm{Z}}^{n}$ , let $\mathfrak{u}^{\theta}\colon\Omega\to[0,1]$ , $\theta\in\Theta$ , be independent random variables which are uniformly distributed on $[0,1]$ , let $\mathcal{U}^{\theta}\colon[0,T]\times\Omega\to[0,T]$ , $\theta\in\Theta$ , satisfy for all $t\in[0,T]$ , $\theta\in\Theta$ that $\mathcal{U}^{\theta}_{t}=t+(T-t)\mathfrak{u}^{\theta}$ , let $W^{\theta}\colon[0,T]\times\Omega\to{\mathbbm{R}}^{d}$ , $\theta\in\Theta$ , be independent standard Brownian motions, assume that $(\mathfrak{u}^{\theta})_{\theta\in\Theta}$ , $(W^{\theta})_{\theta\in\Theta}$ , and $\mathbf{W}$ are independent, and let ${U}_{n,M}^{\theta}\colon[0,T]\times{\mathbbm{R}}^{d}\times\Omega\to{\mathbbm{R}}$ , $n,M\in{\mathbbm{Z}}$ , $\theta\in\Theta$ , be functions which satisfy for all $n,M\in{\mathbbm{N}}$ , $\theta\in\Theta$ , $t\in[0,T]$ , $x\in{\mathbbm{R}}^{d}$ that ${U}_{-1,M}^{\theta}(t,x)={U}_{0,M}^{\theta}(t,x)=0$ and

[TABLE]

2.2 An a priori estimate for solutions of partial differential equations (PDEs)

Lemma 2.2 ( $q$ -th moment of the exact solution).

Assume Setting 2.1 and let $x\in{\mathbbm{R}}^{d}$ , $i\in\{1,2\}$ . Then it holds that

[TABLE]

Proof of Lemma 2.2.

Throughout this proof let $\mu_{t}\colon\mathcal{B}({\mathbbm{R}}^{d})\to[0,1]$ , $t\in[0,T]$ be the probability measures which satisfy for all $t\in[0,T]$ , $B\in\mathcal{B}({\mathbbm{R}}^{d})$ that

[TABLE]

The integral transformation theorem, (8), and the triangle inequality show for all $t\in[0,T]$ that

[TABLE]

Next, Jensen’s inequality, Fubini’s theorem, (11), the fact that $\mathbf{W}$ has independent and stationary increments, and (4) demonstrate that for all $t\in[0,T]$ it holds that

[TABLE]

Furthermore, Jensen’s inequality, Fubini’s theorem, (11), the fact that $\mathbf{W}$ has independent and stationary increments, the triangle inequality, (3), and (4) demonstrate for all $t\in[0,T]$ that

[TABLE]

Combining this with (12) and (13) implies that for all $t\in[0,T]$ it holds that

[TABLE]

Next, [HJK+18, Corollary 3.11] shows that

[TABLE]

This, the triangle inequality, and the fact that ${\mathbb{E}}\!\left[\left\|{\mathbf{W}_{T}}\right\|^{pq}\right]<\infty$ show that

[TABLE]

This, Gronwall’s integral inequality, and (15) establish for all $t\in[0,T]$ that

[TABLE]

The proof of Lemma 2.2 is thus completed. ∎

2.3 A stability result for solutions of PDEs

Lemma 2.3.

Assume Setting 2.1. Then it holds for all $t\in[0,T]$ , $x\in{\mathbbm{R}}^{d}$ that

[TABLE]

Proof of Lemma 2.3.

First, (8), the triangle inequality, and the fact that $\mathbf{W}$ has stationary increments show for all $s\in[0,T]$ , $z\in{\mathbbm{R}}^{d}$ that

[TABLE]

This, Fubini’s theorem, the fact that $\mathbf{W}$ has independent increments, and the Lipschitz condition in (3) ensure that for all $s\in[0,T]$ , $x\in{\mathbbm{R}}^{d}$ it holds that

[TABLE]

This, Gronwall’s lemma, and Lemma 2.2 yield for all $x\in{\mathbbm{R}}^{d}$ that

[TABLE]

Furthermore, (5), the triangle inequality, and Lemma 2.2 imply for all $x\in{\mathbbm{R}}^{d}$ that

[TABLE]

This, (22), and the triangle inequality yield that

[TABLE]

This completes the proof of Lemma 2.3. ∎

2.4 A stability result for MLP approximations

Corollary 2.4.

Assume Setting 2.1, let $x\in{\mathbbm{R}}^{d}$ , $N,M\in{\mathbbm{N}}$ , and assume that $q\geq 2$ . Then it holds that

[TABLE]

Proof of Corollary 2.4.

First, Lemma 2.2 implies that $\int_{0}^{T}\left({\mathbb{E}}\!\left[\big{.}\!\left|u_{i}(t,x+\mathbf{W}_{t})\right|^{2}\right]\right)^{\!\!\nicefrac{{1}}{{2}}}dt<\infty$ . This, [HJK+18, Theorem 3.5] (with $\xi=x$ , $F=F_{2}$ , $g=g_{2}$ , and $u=u_{2}$ in the notation of [HJK+18, Theorem 3.5]), (4), and the triangle inequality ensure that

[TABLE]

Furthermore, Lemma 2.3 shows that

[TABLE]

This, the triangle inequality, (26), the fact that $B\leq B^{q}+1$ , the assumption that $q\geq 2$ , and Jensen’s inequality show that

[TABLE]

The proof of Corollary 2.4 is thus completed. ∎

3 Deep neural network representations for MLP approximations

The main result of this section, Lemma 3.10 below, shows that multilevel Picard aproximations can be well represented by DNNs. The central tools for the proof of Lemma 3.10 are Lemmas 3.8 and 3.9 which show that DNNs are stable under compositions and summations. We formulate Lemmas 3.8 and 3.9 in terms of the operators defined in (34) below, whose properties are studied in Lemmas 3.3, 3.4, and 3.5.

3.1 A mathematical framework for deep neural networks

Setting 3.1 (Artificial neural networks).

Let $\left\|\cdot\right\|,{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\cdot\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}\colon(\cup_{d\in{\mathbbm{N}}}{\mathbbm{R}}^{d})\to[0,\infty)$ and $\dim\colon(\cup_{d\in{\mathbbm{N}}}{\mathbbm{R}}^{d})\to{\mathbbm{N}}$ satisfy for all $d\in{\mathbbm{N}}$ , $x=(x_{1},\ldots,x_{d})\in{\mathbbm{R}}^{d}$ that $\|x\|=\sqrt{\sum_{i=1}^{d}(x_{i})^{2}}$ , ${\left|\kern-1.07639pt\left|\kern-1.07639pt\left|x\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}=\max_{i\in[1,d]\cap{\mathbbm{N}}}|x_{i}|$ , and $\dim\!\left(x\right)=d$ , let $\mathbf{A}_{d}\colon{\mathbbm{R}}^{d}\to{\mathbbm{R}}^{d}$ , $d\in{\mathbbm{N}}$ , satisfy for all $d\in{\mathbbm{N}}$ , $x=(x_{1},\ldots,x_{d})\in{\mathbbm{R}}^{d}$ that

[TABLE]

let $\mathbf{D}=\cup_{H\in{\mathbbm{N}}}{\mathbbm{N}}^{H+2}$ , let

[TABLE]

let $\mathcal{D}\colon\mathbf{N}\to\mathbf{D}$ and $\mathcal{R}\colon\mathbf{N}\to(\cup_{k,l\in{\mathbbm{N}}}C({\mathbbm{R}}^{k},{\mathbbm{R}}^{l}))$ satisfy for all $H\in{\mathbbm{N}}$ , $k_{0},k_{1},\ldots,k_{H},k_{H+1}\in{\mathbbm{N}}$ , $\Phi=((W_{1},B_{1}),\ldots,(W_{H+1},B_{H+1}))\in\prod_{n=1}^{H+1}\left({\mathbbm{R}}^{k_{n}\times k_{n-1}}\times{\mathbbm{R}}^{k_{n}}\right),$ $x_{0}\in{\mathbbm{R}}^{k_{0}},\ldots,x_{H}\in{\mathbbm{R}}^{k_{H}}$ with $\forall\,n\in{\mathbbm{N}}\cap[1,H]\colon x_{n}=\mathbf{A}_{k_{n}}(W_{n}x_{n-1}+B_{n})$ that

[TABLE]

*let $\odot\colon\mathbf{D}\times\mathbf{D}\to\mathbf{D}$ satisfy for all $H_{1},H_{2}\in{\mathbbm{N}}$ , $\alpha=(\alpha_{0},\alpha_{1},\ldots,\alpha_{H_{1}},\alpha_{H_{1}+1})\in{\mathbbm{N}}^{H_{1}+2}$ , $\beta=(\beta_{0},\beta_{1},\ldots,\beta_{H_{2}},\beta_{H_{2}+1})\in{\mathbbm{N}}^{H_{2}+2}$ that *

[TABLE]

*let $\operatorname*{\boxplus}\colon\mathbf{D}\times\mathbf{D}\to\mathbf{D}$ satisfy for all $H\in{\mathbbm{N}}$ , $\alpha=(\alpha_{0},\alpha_{1},\ldots,\alpha_{H},\alpha_{H+1})\in{\mathbbm{N}}^{H+2}$ , $\beta=(\beta_{0},\beta_{1},\beta_{2},\ldots,\beta_{H},\beta_{H+1})\in{\mathbbm{N}}^{H+2}$ that *

[TABLE]

and let $\mathfrak{n}_{n}\in\mathbf{D}$ , $n\in[3,\infty)\cap{\mathbbm{N}}$ , satisfy for all $n\in[3,\infty)\cap{\mathbbm{N}}$ that

[TABLE]

Remark 3.2.

The set $\mathbf{N}$ can be viewed as the set of all artificial neural networks. For each network $\Phi\in\mathbf{N}$ the function $\mathcal{R}(\Phi)$ is the function represented by $\Phi$ and the vector $\mathcal{D}(\Phi)$ describes the layer dimensions of $\Phi$ .

3.2 Properties of operations associated to deep neural networks

Lemma 3.3 ( $\odot$ is associative).

Assume Setting 3.1 and let $\alpha,\beta,\gamma\in\mathbf{D}$ . Then it holds that $(\alpha\odot\beta)\odot\gamma=\alpha\odot(\beta\odot\gamma)$ .

Proof of Lemma 3.3.

Throughout this proof let $H_{1},H_{2},H_{3}\in{\mathbbm{N}}$ , let $(\alpha_{i})_{i\in[0,H_{1}+1]\cap{\mathbbm{N}}_{0}}\in{\mathbbm{N}}^{H_{1}+2}$ , $(\beta_{i})_{i\in[0,H_{2}+1]\cap{\mathbbm{N}}_{0}}\in{\mathbbm{N}}^{H_{2}+2}$ , $(\gamma_{i})_{i\in[0,H_{3}+1]\cap{\mathbbm{N}}_{0}}\in{\mathbbm{N}}^{H_{3}+2}$ satisfy that

[TABLE]

The definition of $\odot$ in (33) then shows that

[TABLE]

The proof of Lemma 3.3 is thus completed. ∎

Lemma 3.4 ( $\operatorname*{\boxplus}$ and associativity).

Assume Setting 3.1, let $H,k,l\in{\mathbbm{N}}$ , and let $\alpha,\beta,\gamma\in\left(\{k\}\times{\mathbbm{N}}^{H}\times\{l\}\right)$ . Then

(i)

it holds that $\alpha\operatorname*{\boxplus}\beta\in\left(\{k\}\times{\mathbbm{N}}^{H}\times\{l\}\right)$ , 2. (ii)

it holds that $\beta\operatorname*{\boxplus}\gamma\in\left(\{k\}\times{\mathbbm{N}}^{H}\times\{l\}\right)$ , and 3. (iii)

it holds that $(\alpha\operatorname*{\boxplus}\beta)\operatorname*{\boxplus}\gamma=\alpha\operatorname*{\boxplus}(\beta\operatorname*{\boxplus}\gamma)$ .

Proof of Lemma 3.4.

Throughout this proof let $\alpha_{i},\beta_{i},\gamma_{i}\in{\mathbbm{N}}$ , $i\in[1,H]\cap{\mathbbm{N}}$ , satisfy that $\alpha=(k,\alpha_{1},\alpha_{2},\ldots,\alpha_{H},l)$ , $\beta=(k,\beta_{1},\beta_{2},\ldots,\beta_{H},l)$ , and $\gamma=(k,\gamma_{1},\gamma_{2},\ldots,\gamma_{H},l).$ The definition of $\operatorname*{\boxplus}$ (see (34)) then shows that

[TABLE]

and

[TABLE]

The proof of Lemma 3.4 is thus completed. ∎

Lemma 3.5 (Triangle inequality).

Assume Setting 3.1, let $H,k,l\in{\mathbbm{N}}$ , and let $\alpha,\beta\in\{k\}\times{\mathbbm{N}}^{H}\times\{l\}$ . Then it holds that ${\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\alpha\operatorname*{\boxplus}\beta\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}\leq{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\alpha\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}+{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\beta\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}$ .

Proof of Lemma 3.5.

Throughout this proof let $\alpha_{i},\beta_{i}\in{\mathbbm{N}}$ , $i\in[1,H]\cap{\mathbbm{N}}$ satisfy that $\alpha=(k,\alpha_{1},\alpha_{2},\ldots,\alpha_{H},l)$ and $\beta=(k,\beta_{1},\beta_{2},\ldots,\beta_{H},l).$ The definition of $\operatorname*{\boxplus}$ (see (34)) then shows that $\alpha\operatorname*{\boxplus}\beta=(k,\alpha_{1}+\beta_{1},\alpha_{2}+\beta_{2},\ldots,\alpha_{H}+\beta_{H},l).$ This together with the triangle inequality implies that

[TABLE]

This completes the proof of Lemma 3.5. ∎

The following result, Lemma 3.6, is a variant of [JSW18, Lemma 5.4].

Lemma 3.6 (Existence of DNNs with $H\in{\mathbbm{N}}$ hidden layers for the identity in ${\mathbbm{R}}$ ).

Assume Setting 3.1 and let $H\in{\mathbbm{N}}$ . Then it holds that $\mathrm{Id}_{{\mathbbm{R}}}\in\mathcal{R}(\{\Phi\in\mathbf{N}\colon\mathcal{D}(\Phi)=\mathfrak{n}_{H+2}\})$ .

Proof of Lemma 3.6.

Throughout this proof let $W_{1}\in{\mathbbm{R}}^{2\times 1}$ , $W_{i}\in{\mathbbm{R}}^{2\times 2}$ , $\,i\in[2,H]\cap{\mathbbm{N}}$ , $W_{H+1}\in{\mathbbm{R}}^{1\times 2}$ , $B_{i}\in{\mathbbm{R}}^{2}$ , $i\in[1,H]\cap{\mathbbm{N}}$ , $B_{H+1}\in{\mathbbm{R}}^{1}$ satisfy that

[TABLE]

let $\phi\in\mathbf{N}$ satisfy that $\phi=((W_{1},B_{1}),(W_{2},B_{2}),\ldots,(W_{H},B_{H}),(W_{H+1},B_{H+1}))$ , for every $a\in{\mathbbm{R}}$ let $a^{+}\in[0,\infty)$ be the non-negative part of $a$ , i.e., $a^{+}=\max\{a,0\}$ , and let $x_{0}\in{\mathbbm{R}}$ , $x_{1},x_{2},\ldots,x_{H}\in{\mathbbm{R}}^{2}$ satisfy for all $n\in{\mathbbm{N}}\cap[1,H]$ that

[TABLE]

Note that (41) and the definition of $\mathcal{D}$ (see (31)) imply that $\mathcal{D}(\phi)=\mathfrak{n}_{H+2}$ . Furthermore, (41), (42), and an induction argument show that

[TABLE]

The definition of $\mathcal{R}$ (see (32)) hence ensures that

[TABLE]

The fact that $x_{0}$ was arbitrary therefore proves that $\mathcal{R}(\phi)=\mathrm{Id}_{{\mathbbm{R}}}$ . This and the fact that $\mathcal{D}(\phi)=\mathfrak{n}_{H+2}$ demonstrate that $\mathrm{Id}_{{\mathbbm{R}}}\in\mathcal{R}(\{\Phi\in\mathbf{N}\colon\mathcal{D}(\Phi)=\mathfrak{n}_{H+2}\})$ . The proof of Lemma 3.6 is thus completed. ∎

Lemma 3.7 (DNNs for affine transformations).

Assume Setting 3.1 and let $d,m\in{\mathbbm{N}}$ , $\lambda\in{\mathbbm{R}}$ , $b\in{\mathbbm{R}}^{d}$ , $a\in{\mathbbm{R}}^{m}$ , $\Psi\in\mathbf{N}$ satisfy that $\mathcal{R}(\Psi)\in C({\mathbbm{R}}^{d},{\mathbbm{R}}^{m})$ . Then it holds that

[TABLE]

Proof of Lemma 3.7.

Throughout this proof let $H,k_{0},k_{1},\ldots,k_{H+1}\in{\mathbbm{N}}$ satisfy that

[TABLE]

let $((W_{1},B_{1}),(W_{2},B_{2}),\ldots,(W_{H},B_{H}),(W_{H+1},B_{H+1}))\in\prod_{n=1}^{H+1}\left({\mathbbm{R}}^{k_{n}\times k_{n-1}}\times{\mathbbm{R}}^{k_{n}}\right)$ satisfy that

[TABLE]

let $\phi\in\mathbf{N}$ satisfy that

[TABLE]

and let $x_{0},y_{0}\in{\mathbbm{R}}^{k_{0}},x_{1},y_{1}\in{\mathbbm{R}}^{k_{1}},\ldots,x_{H},y_{H}\in{\mathbbm{R}}^{k_{H}}$ satisfy for all $n\in{\mathbbm{N}}\cap[1,H]$ that

[TABLE]

Then it holds that

[TABLE]

This and an induction argument prove for all $i\in[2,H]\cap{\mathbbm{N}}$ that

[TABLE]

The definition of $\mathcal{R}$ (see (32)) hence shows that

[TABLE]

This and the fact that $y_{0}$ was arbitrary prove that $\mathcal{R}(\phi)=\lambda((\mathcal{R}(\Psi))(\cdot+b)+a)$ . This and the fact that $\mathcal{D}(\phi)=\mathcal{D}(\Psi)$ imply that $\lambda\left((\mathcal{R}(\Psi))(\cdot+b)+a\right)\in\mathcal{R}(\{\Phi\in\mathbf{N}\colon\mathcal{D}(\Phi)=\mathcal{D}(\Psi)\})$ . The proof of Lemma 3.7 is thus completed. ∎

Lemma 3.8 (Composition).

Assume Setting 3.1 and let $d_{1},d_{2},d_{3}\in{\mathbbm{N}}$ , $f\in C({\mathbbm{R}}^{d_{2}},{\mathbbm{R}}^{d_{3}})$ , $g\in C({\mathbbm{R}}^{d_{1}},{\mathbbm{R}}^{d_{2}})$ , $\alpha,\beta\in\mathbf{D}$ satisfy that $f\in\mathcal{R}(\{\Phi\in\mathbf{N}\colon\mathcal{D}(\Phi)=\alpha\})$ and $g\in\mathcal{R}(\{\Phi\in\mathbf{N}\colon\mathcal{D}(\Phi)=\beta\})$ . Then it holds that $(f\circ g)\in\mathcal{R}(\{\Phi\in\mathbf{N}\colon\mathcal{D}(\Phi)=\alpha\odot\beta\})$ .

Proof of Lemma 3.8.

Throughout this proof let $H_{1},H_{2},\alpha_{0},\ldots,\alpha_{H_{1}+1},\beta_{0},\ldots,\beta_{H_{2}+1}\in{\mathbbm{N}}$ , $\Phi_{f},\Phi_{g}\in\mathbf{N}$ satisfy that

[TABLE]

Lemma 5.4 in [JSW18] shows that there exists $\mathbb{I}\in\mathbf{N}$ such that $\mathcal{D}(\mathbb{I})=d_{2}\mathfrak{n}_{3}=(d_{2},2d_{2},d_{2})$ and $\mathcal{R}(\mathbb{I})=\mathrm{Id}_{{\mathbbm{R}}^{d_{2}}}$ . Note that $2d_{2}=\beta_{H_{2}+1}+\alpha_{0}$ . This and [JSW18, Proposition 5.2] (with $\phi_{1}=\Phi_{f}$ , $\phi_{2}=\Phi_{g}$ , and $\mathbb{I}=\mathbb{I}$ in the notation of [JSW18, Proposition 5.2]) show that there exists $\Phi_{f\circ g}\in\mathbf{N}$ such that $\mathcal{R}(\Phi_{f\circ g})=f\circ g$ and $\mathcal{D}(\Phi_{f\circ g})=\mathcal{D}(\Phi_{f})\odot\mathcal{D}(\Phi_{g})=\alpha\odot\beta$ . Hence, it holds that $f\circ g\in\mathcal{R}(\{\Phi\in\mathbf{N}\colon\mathcal{D}(\Phi)=\alpha\odot\beta\})$ . The proof of Lemma 3.8 is thus completed. ∎

The following result, Lemma 3.9, essentially generalizes [JSW18, Lemma 5.1] to the case where the DNNs have different hidden layer dimensions.

Lemma 3.9 (Sum of DNNs of the same length).

Assume Setting 3.1 and let $M,H,p,q\in{\mathbbm{N}}$ , $h_{1},h_{2},\ldots,h_{M}\in{\mathbbm{R}}$ , $k_{i}\in\mathbf{D}$ , $f_{i}\in C({\mathbbm{R}}^{p},{\mathbbm{R}}^{q})$ , $i\in[1,M]\cap{\mathbbm{N}}$ , satisfy for all $i\in[1,M]\cap{\mathbbm{N}}$ that

[TABLE]

Then it holds that

[TABLE]

Proof of Lemma 3.9.

Throughout this proof let $\phi_{i}\in\mathbf{N}$ , $i\in[1,M]\cap{\mathbbm{N}}$ , and $k_{i,n}\in{\mathbbm{N}}$ , $i\in[1,M]\cap{\mathbbm{N}}$ , $n\in[0,H+1]\cap{\mathbbm{N}}_{0}$ , satisfy for all $i\in[1,M]\cap{\mathbbm{N}}$ that

[TABLE]

for every $i\in[1,M]\cap{\mathbbm{N}}$ let $((W_{i,1},B_{i,1}),\ldots,(W_{i,H+1},B_{i,H+1}))\in\prod_{n=1}^{H+1}\left({\mathbbm{R}}^{k_{i,n}\times k_{i,n-1}}\times{\mathbbm{R}}^{k_{i,n}}\right)$ satisfy that

[TABLE]

let $k_{n}^{\operatorname*{\boxplus}}\in{\mathbbm{N}}$ , $n\in[1,H]\cap{\mathbbm{N}}$ , $k^{\operatorname*{\boxplus}}\in{\mathbbm{N}}^{H+2}$ satisfy for all $n\in[1,H]\cap{\mathbbm{N}}$ that

[TABLE]

let $W_{1}\in{\mathbbm{R}}^{k_{1}^{\operatorname*{\boxplus}}\times p}$ , $B_{1}\in{\mathbbm{R}}^{k_{1}^{\operatorname*{\boxplus}}}$ satisfy that

[TABLE]

let $W_{n}\in{\mathbbm{R}}^{k_{n}^{\operatorname*{\boxplus}}\times k_{n-1}^{\operatorname*{\boxplus}}}$ , $B_{n}\in{\mathbbm{R}}^{k^{\operatorname*{\boxplus}}_{n}}$ , $n\in[2,H]\cap{\mathbbm{N}}$ , satisfy for all $n\in[2,H]\cap{\mathbbm{N}}$ that

[TABLE]

let $W_{H+1}\in{\mathbbm{R}}^{q\times k_{H}^{\operatorname*{\boxplus}}}$ , $B_{H+1}\in{\mathbbm{R}}^{q}$ satisfy that

[TABLE]

let $x_{0}\in{\mathbbm{R}}^{p},\,x_{1}\in{\mathbbm{R}}^{k_{1}^{\operatorname*{\boxplus}}},x_{2}\in{\mathbbm{R}}^{k_{2}^{\operatorname*{\boxplus}}}\ldots,x_{H}\in{\mathbbm{R}}^{k_{H}^{\operatorname*{\boxplus}}}$ , let $x_{1,0},x_{2,0},\ldots,x_{M,0}\in{\mathbbm{R}}^{p}$ , $x_{i,n}\in{\mathbbm{R}}^{k_{i,n}}$ , $i\in[1,M]\cap{\mathbbm{N}}$ , $n\in[1,H]\cap{\mathbbm{N}}$ , satisfy for all $i\in[1,M]\cap{\mathbbm{N}}$ , $n\in[1,H]\cap{\mathbbm{N}}$ that

[TABLE]

and let $\psi\in\mathbf{N}$ satisfy that

[TABLE]

First, the definitions of $\mathcal{D}$ and $\mathcal{R}$ (see (31) and 32), (56), and the fact that $\forall\,i\in[1,M]\cap{\mathbbm{N}}\colon f_{i}\in C({\mathbbm{R}}^{p},{\mathbbm{R}}^{q})$ show for all $i\in[1,M]\cap{\mathbbm{N}}$ that $k_{i}=(p,k_{i,1},k_{i,2},\ldots,k_{i,H},q).$ The definition of $\mathcal{D}$ (see (31)), the definition of $\operatorname*{\boxplus}$ (see (34)), and (58) then show that

[TABLE]

Next, we prove by induction on $n\in[1,H]\cap{\mathbbm{N}}$ that $x_{n}=(x_{1,n},x_{2,n},\ldots,x_{M,n})$ . First, (59) shows that

[TABLE]

This implies that

[TABLE]

This proves the base case. Next, for the induction step let $n\in[2,H]\cap{\mathbbm{N}}$ and assume that $x_{n-1}=(x_{1,n-1},x_{2,n-1},\ldots,x_{M,n-1})$ . Then (60) and the induction hypothesis ensure that

[TABLE]

This yields that

[TABLE]

This proves the induction step. Induction now proves for all $n\in[1,H]\cap{\mathbbm{N}}$ that $x_{n}=(x_{1,n},x_{2,n},\ldots,x_{M,n})$ . This, the definition of $\mathcal{R}$ (see (32)), and (61) imply that

[TABLE]

This, the fact that $x_{0}\in{\mathbbm{R}}^{p}$ was arbitrary, and (56) yield that

[TABLE]

This and (64) show that

[TABLE]

The proof of Lemma 3.9 is thus completed. ∎

3.3 Deep neural network representations for MLP approximations

Lemma 3.10.

Assume Setting 3.1, let $d,M\in{\mathbbm{N}}$ , $T,c\in(0,\infty)$ , $f\in C({\mathbbm{R}},{\mathbbm{R}})$ , $g\in C({\mathbbm{R}}^{d},{\mathbbm{R}})$ , $\Phi_{f},\Phi_{g}\in\mathbf{N}$ satisfy that $\mathcal{R}(\Phi_{f})=f$ , $\mathcal{R}(\Phi_{g})=g$ , and

[TABLE]

let $(\Omega,\mathcal{F},{\mathbb{P}})$ be a probability space, let $\Theta=\bigcup_{n\in{\mathbbm{N}}}{\mathbbm{Z}}^{n}$ , let $\mathfrak{u}^{\theta}\colon\Omega\to[0,1]$ , $\theta\in\Theta$ , be independent random variables which are uniformly distributed on $[0,1]$ , let $\mathcal{U}^{\theta}\colon[0,T]\times\Omega\to[0,T]$ , $\theta\in\Theta$ , satisfy for all $t\in[0,T]$ , $\theta\in\Theta$ that $\mathcal{U}^{\theta}_{t}=t+(T-t)\mathfrak{u}^{\theta}$ , let $W^{\theta}\colon[0,T]\times\Omega\to{\mathbbm{R}}^{d}$ , $\theta\in\Theta$ , be independent standard Brownian motions with continuous sample paths, assume that $(\mathfrak{u}^{\theta})_{\theta\in\Theta}$ and $(W^{\theta})_{\theta\in\Theta}$ are independent, let ${U}_{n,M}^{\theta}\colon[0,T]\times{\mathbbm{R}}^{d}\times\Omega\to{\mathbbm{R}}$ , $n,M\in{\mathbbm{Z}}$ , $\theta\in\Theta$ , satisfy for all $n\in{\mathbbm{N}}$ , $\theta\in\Theta$ , $t\in[0,T]$ , $x\in{\mathbbm{R}}^{d}$ that ${U}_{-1,M}^{\theta}(t,x)={U}_{0,M}^{\theta}(t,x)=0$ and

[TABLE]

and let $\omega\in\Omega$ . Then for all $n\in{\mathbbm{N}}_{0}$ there exists a family $(\Phi_{n,t}^{\theta})_{\theta\in\Theta,t\in[0,T]}\subseteq\mathbf{N}$ such that

(i)

it holds for all $t_{1},t_{2}\in[0,T]$ , $\theta_{1},\theta_{2}\in\Theta$ that

[TABLE] 2. (ii)

it holds for all $t\in[0,T]$ , $\theta\in\Theta$ that

[TABLE] 3. (iii)

it holds for all $t\in[0,T]$ , $\theta\in\Theta$ that

[TABLE]

and 4. (iv)

it holds for all $\theta\in\Theta$ , $t\in[0,T]$ , $x\in{\mathbbm{R}}^{d}$ that

[TABLE]

Proof of Lemma 3.10.

We prove Lemma 3.10 by induction on $n\in{\mathbbm{N}}_{0}$ . For the base case $n=0$ note that the fact that $\forall\,t\in[0,T],\theta\in\Theta\colon U^{\theta}_{0,M}(t,\cdot)=0$ , the fact that the function [math] can be represented by a network with depth $\dim\!\left(\mathcal{D}\left(\Phi_{g}\right)\right)$ , and (72) imply that there exists $(\Phi_{0,t}^{\theta})_{\theta\in\Theta,t\in[0,T]}\subseteq\mathbf{N}$ such that it holds for all $t_{1},t_{2}\in[0,T]$ , $\theta_{1},\theta_{2}\in\Theta$ that $\mathcal{D}\left(\Phi_{0,t_{1}}^{\theta_{1}}\right)=\mathcal{D}\left(\Phi_{0,t_{2}}^{\theta_{2}}\right)$ and such that it holds for all $\theta\in\Theta$ , $t\in[0,T]$ that $\dim\!\left(\mathcal{D}(\Phi_{0,t}^{\theta})\right)=\dim\!\left(\mathcal{D}\left(\Phi_{g}\right)\right)$ , ${\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\mathcal{D}(\Phi_{0,t}^{\theta})\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}\leq{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\mathcal{D}(\Phi_{g})\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}\leq c$ , and ${U}_{0,M}^{\theta}(t,\cdot,\omega)=\mathcal{R}(\Phi_{0,t}^{\theta})$ . This proves the base case $n=0$ .

For the induction step from $n\in{\mathbbm{N}}_{0}$ to $n+1\in{\mathbbm{N}}$ let $n\in{\mathbbm{N}}_{0}$ and assume that Item i–Item iv hold true for all $k\in[0,n]\cap{\mathbbm{N}}_{0}$ . The assumption that $g=\mathcal{R}(\Phi_{g})$ and Lemma 3.7 (with $d=d$ , $m=1$ , $\lambda=1$ , $a=0$ , $b=W^{\theta}_{T}(\omega)-W^{\theta}_{t}(\omega)$ , and $\Psi=\Phi_{g}$ for $\theta\in\Theta$ , $t\in[0,T]$ in the notation of Lemma 3.7) show for all $\theta\in\Theta$ , $t\in[0,T]$ that

[TABLE]

Furthermore, Lemma 3.6 (with $H=(n+1)\left(\dim\!\left(\mathcal{D}(\Phi_{f})\right)-1\right)-1$ in the notation of Lemma 3.6) ensures that

[TABLE]

This, (78), and Lemma 3.8 (with $d_{1}=d$ , $d_{2}=1$ , $d_{3}=1$ , $f=\mathrm{Id}_{{\mathbbm{R}}}$ , $g=g\big{(}\cdot+W^{\theta}_{T}(\omega)-W^{\theta}_{t}(\omega)\big{)}$ , $\alpha=\mathfrak{n}_{(n+1)\left(\dim\!\left(\mathcal{D}(\Phi_{f})\right)-1\right)+1}$ , and $\beta=\mathcal{D}(\Phi_{g})$ for $\theta\in\Theta$ , $t\in[0,T]$ in the notation of Lemma 3.8) show that for all $\theta\in\Theta$ , $t\in[0,T]$ it holds that

[TABLE]

Next, the induction hypothesis implies for all $\theta\in\Theta$ , $t\in[0,T]$ , $l\in[0,n]\cap{\mathbbm{N}}_{0}$ that

[TABLE]

This and Lemma 3.7 (with

[TABLE]

in the notation of Lemma 3.7) imply that for all $\theta,\eta\in\Theta$ , $t\in[0,T]$ , $l\in[0,n]\cap{\mathbbm{N}}_{0}$ it holds that

[TABLE]

Moreover, Lemma 3.6 (with $H=(n-l)\left(\dim\!\left(\mathcal{D}(\Phi_{f})\right)-1\right)-1$ for $l\in[0,n-1]\cap{\mathbbm{N}}_{0}$ in the notation of Lemma 3.6) ensures for all $l\in[0,n-1]\cap{\mathbbm{N}}_{0}$ that

[TABLE]

This, (83), and Lemma 3.8 (with

[TABLE]

in the notation of Lemma 3.8) prove for all $\eta,\theta\in\Theta$ , $t\in[0,T]$ , $l\in[0,n-1]\cap{\mathbbm{N}}_{0}$ that

[TABLE]

This and Lemma 3.8 (with

[TABLE]

in the notation of Lemma 3.8) assure for all $\eta,\theta\in\Theta$ , $t\in[0,T]$ , $l\in[0,n-1]\cap{\mathbbm{N}}_{0}$ that

[TABLE]

Next, (83) (with $l=n$ ) and Lemma 3.8 (with

[TABLE]

in the notation of Lemma 3.8) prove for all $\eta,\theta\in\Theta$ , $t\in[0,T]$ that

[TABLE]

Furthermore, the definition of $\odot$ in (33) and the fact that

[TABLE]

in the induction hypothesis imply that

[TABLE]

that

[TABLE]

and for all $l\in[0,n-1]\cap{\mathbbm{N}}_{0}$ that

[TABLE]

This shows, roughly speaking, that the functions in (80), (90), and (88) can be represented by networks with the same depth (i.e. number of layers): $(n+1)(\dim\!\left(\mathcal{D}(\Phi_{f})\right)-1)+\dim\!\left(\mathcal{D}\left(\Phi_{g}\right)\right)$ . Hence, Lemma 3.9

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

A proof that rectified deep neural networks

Abstract

Contents

1 Introduction

Theorem 1.1**.**

2 A stability result for full history recursive multilevel Picard (MLP) approximations

2.1 Setting

Setting 2.1**.**

2.2 An a priori estimate for solutions of partial differential equations (PDEs)

Lemma 2.2** (qqq-th moment of the exact solution).**

Proof of Lemma 2.2.

2.3 A stability result for solutions of PDEs

Lemma 2.3**.**

Proof of Lemma 2.3.

2.4 A stability result for MLP approximations

Corollary 2.4**.**

Proof of Corollary 2.4.

3 Deep neural network representations for MLP approximations

3.1 A mathematical framework for deep neural networks

Setting 3.1** (Artificial neural networks).**

Remark 3.2**.**

3.2 Properties of operations associated to deep neural networks

Lemma 3.3** (⊙\odot⊙ is associative).**

Proof of Lemma 3.3.

Lemma 3.4** (⊞⁡\operatorname*{\boxplus}⊞ and associativity).**

Proof of Lemma 3.4.

Lemma 3.5** (Triangle inequality).**

Proof of Lemma 3.5.

Lemma 3.6** (Existence of DNNs with H∈\mathbbmNH\in{\mathbbm{N}}H∈\mathbbmN hidden layers for the identity in \mathbbmR{\mathbbm{R}}\mathbbmR).**

Proof of Lemma 3.6.

Lemma 3.7** (DNNs for affine transformations).**

Proof of Lemma 3.7.

Lemma 3.8** (Composition).**

Proof of Lemma 3.8.

Lemma 3.9** (Sum of DNNs of the same length).**

Proof of Lemma 3.9.

3.3 Deep neural network representations for MLP approximations

Lemma 3.10**.**

Proof of Lemma 3.10.

Theorem 1.1.

Setting 2.1.

Lemma 2.2 ( $q$ -th moment of the exact solution).

Lemma 2.3.

Corollary 2.4.

Setting 3.1 (Artificial neural networks).

Remark 3.2.

Lemma 3.3 ( $\odot$ is associative).

Lemma 3.4 ( $\operatorname*{\boxplus}$ and associativity).

Lemma 3.5 (Triangle inequality).

Lemma 3.6 (Existence of DNNs with $H\in{\mathbbm{N}}$ hidden layers for the identity in ${\mathbbm{R}}$ ).

Lemma 3.7 (DNNs for affine transformations).

Lemma 3.8 (Composition).

Lemma 3.9 (Sum of DNNs of the same length).

Lemma 3.10.