A proof that rectified deep neural networks overcome the curse of dimensionality in the numerical approximation of semilinear heat equations | Tomesphere
arXiv:1901.10854·math.NA·November 25, 2020
A proof that rectified deep neural networks overcome the curse of dimensionality in the numerical approximation of semilinear heat equations
Martin Hutzenthaler, Arnulf Jentzen, Thomas Kruse, Tuan Anh Nguyen
This paper rigorously proves that deep neural networks can efficiently approximate solutions to high-dimensional semilinear heat equations, overcoming the curse of dimensionality with polynomial growth in parameters.
Contribution
It provides the first mathematical proof that deep neural networks overcome the curse of dimensionality for a class of nonlinear PDEs, specifically semilinear heat equations.
Findings
01
Neural network parameters grow polynomially with dimension and accuracy
02
Proof relies on multilevel Picard approximations
03
Overcomes the curse of dimensionality in nonlinear PDE approximation
Abstract
Deep neural networks and other deep learning methods have very successfully been applied to the numerical approximation of high-dimensional nonlinear parabolic partial differential equations (PDEs), which are widely used in finance, engineering, and natural sciences. In particular, simulations indicate that algorithms based on deep learning overcome the curse of dimensionality in the numerical approximation of solutions of semilinear PDEs. For certain linear PDEs this has also been proved mathematically. The key contribution of this article is to rigorously prove this for the first time for a class of nonlinear PDEs. More precisely, we prove in the case of semilinear heat equations with gradient-independent nonlinearities that the numbers of parameters of the employed deep neural networks grow at most polynomially in both the PDE dimension and the reciprocal of the prescribed…
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Full text
A proof that rectified deep neural networks
overcome the curse of dimensionality in the numerical
approximation of semilinear
heat equations
Martin Hutzenthaler1, Arnulf Jentzen2, Thomas Kruse3, & Tuan Anh Nguyen4
1 Faculty of Mathematics, University of Duisburg-Essen,
Deep neural networks and other deep learning methods have very successfully
been applied to the numerical approximation of high-dimensional
nonlinear
parabolic partial differential equations (PDEs), which are widely used in finance, engineering,
and natural sciences.
In particular, simulations indicate that algorithms based on deep learning
overcome the curse of dimensionality in the numerical approximation
of solutions of semilinear PDEs. For certain linear PDEs this has also been proved
mathematically.
The key contribution of this article is to rigorously prove this for the first time for a class of nonlinear PDEs.
More precisely,
we prove
in the case of semilinear heat equations with gradient-independent nonlinearities
that the numbers of parameters of the employed deep neural networks
grow at most polynomially in both the PDE dimension and
the reciprocal of the prescribed approximation accuracy.
Our proof relies on recently introduced full history recursive multilevel Picard approximations
of semilinear PDEs.
00footnotetext: AMS 2010 subject classification: 65C99;
68T0500footnotetext: Key words and phrases: curse of dimensionality, high-dimensional PDEs, deep neural networks, information based complexity,
tractability of multivariate problems,
multilevel Picard approximations
Deep neural networks (DNNs) have revolutionized
a number of computational problems; see, e.g., the references in Grohs et al. [GHJvW18].
In 2017 deep learning-based approximation algorithms for certain parabolic partial differential equations (PDEs) have been proposed in Han et al. [EHJ17, HJE18] and based on these works there is now a series of deep learning-based numerical approximation algorithms for a large class of different kinds of PDEs in the scientific literature; see, e.g.,
[BBG+18, BEJ17, BCJ18, EY18, EGJS18, FTT17, GHJvW18, Hen17, KLY17, Mis18, NM18, Rai18, SS17].
There is empirical evidence that deep learning-based methods work exceptionally well
for approximating solutions of high-dimensional PDEs and that these do not suffer from the
curse of dimensionality; see, e.g., the simulations in
[EHJ17, HJE18, BEJ17, BBG+18].
There exist, however, only few theoretical results which prove
that DNN approximations of solutions of PDEs do not suffer
from the curse of dimensionality:
The recent articles [GHJvW18, BGJ18, JSW18, EGJS18]
prove rigorously that DNN approximations overcome the curse
of dimensionality in the numerical
approximation of solutions of certain linear PDEs.
The main result of this article, Theorem LABEL:n18 below, proves
for
semilinear heat equations
with gradient-independent nonlinearities
that the number of parameters
of the approximating DNN grows at most polynomially in both
the PDE dimension d∈\mathbbmN and the reciprocal of the prescribed accuracy ε>0.
Thereby, we establish for the first time that there exist DNN approximations of solutions of such PDEs
which indeed overcome the curse of dimensionality.
To illustrate the main result of this article we formulate in the following result,
Theorem 1.1 below, a special case of Theorem LABEL:n18.
Theorem 1.1**.**
Let Ad:\mathbbmRd→\mathbbmRd, d∈\mathbbmN={1,2,…}, and ∥⋅∥:(∪d∈\mathbbmN\mathbbmRd)→[0,∞)
satisfy for all d∈\mathbbmN, x=(x1,…,xd)∈\mathbbmRd that
Ad(x)=(max{x1,0},…,max{xd,0})
and
∥x∥=[∑i=1d(xi)2]1/2,
let
N=∪H∈\mathbbmN∪(k0,k1,…,kH+1)∈\mathbbmNH+2[∏n=1H+1(\mathbbmRkn×kn−1×\mathbbmRkn)],
let
R:N→(∪k,l∈\mathbbmNC(\mathbbmRk,\mathbbmRl)) and
P:N→\mathbbmN
satisfy
for all H∈\mathbbmN, k0,k1,…,kH,kH+1∈\mathbbmN,
Φ=((W1,B1),…,(WH+1,BH+1))∈∏n=1H+1(\mathbbmRkn×kn−1×\mathbbmRkn),x0∈\mathbbmRk0,…,xH∈\mathbbmRkH with
∀n∈\mathbbmN∩[1,H]:xn=Akn(Wnxn−1+Bn)
that
[TABLE]
let
T,κ∈(0,∞), f∈C(\mathbbmR,\mathbbmR), (gd,ε)d∈\mathbbmN,ε∈(0,1]⊆N,
(cd)d∈\mathbbmN⊆(0,∞),
for every d∈\mathbbmN let gd∈C(\mathbbmRd,\mathbbmR),
for every d∈\mathbbmN let ud∈C1,2([0,T]×\mathbbmRd,\mathbbmR),
and assume for all d∈\mathbbmN, v,w∈\mathbbmR, x∈\mathbbmRd, ε∈(0,1], t∈(0,T) that
∣f(v)−f(w)∣≤κ∣v−w∣,
R(gd,ε)∈C(\mathbbmRd,\mathbbmR),
∣(R(gd,ε))(x)∣≤κdκ(1+∥x∥κ),
∣gd(x)−(R(gd,ε))(x)∣≤εκdκ(1+∥x∥κ),
P(gd,ε)≤κdκε−κ,
∣ud(t,x)∣≤cd(1+∥x∥cd),
ud(0,x)=gd(x),
and
[TABLE]
Then
there exist (Ψd,ε)d∈\mathbbmN,ε∈(0,1]⊆N, η∈(0,∞)
such that for all
d∈\mathbbmN, ε∈(0,1] it holds that
R(Ψd,ε)∈C(\mathbbmRd,\mathbbmR),
P(Ψd,ε)≤ηdηε−η, and
[TABLE]
Theorem 1.1 is an immediate consequence of LABEL:cor:main_thm in LABEL:subsec:dnn_approx_gen_pol below
(with T=2T, ud(t,x)=ud(T−2t,x), f(v)=f(v)/2 for t∈[0,2T], x∈\mathbbmRd, v∈\mathbbmR in the notation of
LABEL:cor:main_thm).
In the manner of the proof of Theorem 3.14 in [GHJvW18] and the proof of Theorem 6.3 in [JSW18], the proof of LABEL:n18 below uses probabilistic arguments on a suitable artificial probability space.
Moreover, the proof of LABEL:n18 relies on recently introduced full history recursive multilevel Picard (MLP) approximations
which have been proved to overcome the curse of dimensionality
in the numerical approximation of solutions of semilinear heat equations at single space-time points;
see
[EHJK16, EHJK17, HK17, HJK+18].
A key step in our proof is that realizations of certain MLP approximations
can be represented by DNNs; see Lemma 3.10 below.
The remainder of this article is organized as follows. In Section 2 we provide auxiliary results on multilevel Picard approximations ensuring that these approximations are stable against perturbations in the nonlinearity f and the terminal condition g of the PDE 1.
In Section 3 we show that multilevel Picard approximations can be represented by DNNs and we provide bounds for the number of parameters of the representing DNN.
We use the results of Section 2 and Section 3 to prove the main result LABEL:n18 in LABEL:sec:main_result.
2 A stability result for full history recursive multilevel Picard (MLP) approximations
2.1 Setting
Setting 2.1**.**
Let d∈\mathbbmN, T,L,δ,B∈(0,∞), p,q∈[1,∞), f1,f2∈C([0,T]×\mathbbmRd×\mathbbmR,\mathbbmR),
g1,g2∈C(\mathbbmRd,\mathbbmR),
let ∥⋅∥:\mathbbmRd→[0,∞) satisfy for all x=(x1,…,xd)∈\mathbbmRd that ∥x∥=[∑i=1d(xi)2]1/2,
assume for all t∈[0,T],
x∈\mathbbmRd,
w,v∈\mathbbmR, i∈{1,2} that
[TABLE]
[TABLE]
and
[TABLE]
let Fi:C([0,T]×\mathbbmRd,\mathbbmR)→C([0,T]×\mathbbmRd,\mathbbmR), i∈{1,2},
satisfy for all v∈C([0,T]×\mathbbmRd,\mathbbmR), t∈[0,T],
x∈\mathbbmRd, i∈{1,2} that
[TABLE]
let
(Ω,F,P)
be a probability space, let W:[0,T]×Ω→\mathbbmRd be a standard
Brownian motion with continuous sample paths,
let u1,u2∈C([0,T]×\mathbbmRd,\mathbbmR), assume
for all i∈{1,2}, s∈[0,T], x∈\mathbbmRd that
[TABLE]
and
[TABLE]
let
Θ=⋃n∈\mathbbmN\mathbbmZn,
let uθ:Ω→[0,1], θ∈Θ, be independent random variables which are uniformly distributed on [0,1],
let Uθ:[0,T]×Ω→[0,T], θ∈Θ, satisfy
for all t∈[0,T], θ∈Θ that
Utθ=t+(T−t)uθ,
let Wθ:[0,T]×Ω→\mathbbmRd, θ∈Θ, be independent
standard Brownian motions,
assume that (uθ)θ∈Θ, (Wθ)θ∈Θ, and W are independent,
and
let
Un,Mθ:[0,T]×\mathbbmRd×Ω→\mathbbmR, n,M∈\mathbbmZ, θ∈Θ, be functions
which satisfy
for all n,M∈\mathbbmN, θ∈Θ,
t∈[0,T], x∈\mathbbmRd
that U−1,Mθ(t,x)=U0,Mθ(t,x)=0 and
[TABLE]
2.2 An a priori estimate for solutions of partial differential equations (PDEs)
Lemma 2.2** (q-th moment of the exact solution).**
Assume Setting 2.1 and let
x∈\mathbbmRd, i∈{1,2}. Then it holds
that
Throughout this proof
let μt:B(\mathbbmRd)→[0,1], t∈[0,T] be the probability measures which satisfy
for all t∈[0,T], B∈B(\mathbbmRd) that
[TABLE]
The integral transformation theorem, (8), and the triangle inequality
show
for all t∈[0,T] that
[TABLE]
Next, Jensen’s inequality, Fubini’s theorem, (11), the fact that W has independent and stationary increments,
and (4) demonstrate that
for all t∈[0,T] it holds that
[TABLE]
Furthermore, Jensen’s inequality, Fubini’s theorem, (11), the fact that W has independent and stationary increments, the triangle inequality, (3), and (4) demonstrate
for all t∈[0,T] that
[TABLE]
Combining this with (12) and (13)
implies that
for all t∈[0,T] it holds that
[TABLE]
Next,
[HJK+18, Corollary 3.11]
shows that
[TABLE]
This, the triangle inequality,
and the fact that E[∥WT∥pq]<∞ show that
[TABLE]
This,
Gronwall’s integral inequality, and (15) establish
for all t∈[0,T]
that
First, (8), the triangle inequality, and the fact that
W has stationary increments
show for all s∈[0,T], z∈\mathbbmRd that
[TABLE]
This, Fubini’s theorem, the fact that W has independent increments, and the Lipschitz condition in (3) ensure that for all s∈[0,T], x∈\mathbbmRd it holds that
[TABLE]
This, Gronwall’s lemma, and Lemma 2.2 yield for all x∈\mathbbmRd that
[TABLE]
Furthermore, (5), the triangle inequality, and Lemma 2.2
imply for all x∈\mathbbmRd that
[TABLE]
This, (22), and the triangle inequality yield that
First, Lemma 2.2 implies that
\int_{0}^{T}\left({\mathbb{E}}\!\left[\big{.}\!\left|u_{i}(t,x+\mathbf{W}_{t})\right|^{2}\right]\right)^{\!\!\nicefrac{{1}}{{2}}}dt<\infty.
This, [HJK+18, Theorem 3.5]
(with ξ=x, F=F2, g=g2, and u=u2 in the notation of [HJK+18, Theorem 3.5]), (4), and the triangle inequality
ensure that
3 Deep neural network representations for MLP approximations
The main result of this section, Lemma 3.10 below, shows that multilevel Picard aproximations can be well represented by DNNs. The central tools for the proof of Lemma 3.10 are Lemmas 3.8 and 3.9 which show that DNNs are stable under compositions and summations. We formulate Lemmas 3.8 and 3.9 in terms of the operators defined in (34) below, whose properties are studied in Lemmas 3.3, 3.4, and 3.5.
3.1 A mathematical framework for deep neural networks
Setting 3.1** (Artificial neural networks).**
Let ∥⋅∥,∣∣∣⋅∣∣∣:(∪d∈\mathbbmN\mathbbmRd)→[0,∞) and
dim:(∪d∈\mathbbmN\mathbbmRd)→\mathbbmN
satisfy for all d∈\mathbbmN, x=(x1,…,xd)∈\mathbbmRd that ∥x∥=∑i=1d(xi)2, ∣∣∣x∣∣∣=maxi∈[1,d]∩\mathbbmN∣xi∣, and
dim(x)=d,
let
Ad:\mathbbmRd→\mathbbmRd, d∈\mathbbmN, satisfy for all d∈\mathbbmN, x=(x1,…,xd)∈\mathbbmRd that
[TABLE]
let D=∪H∈\mathbbmN\mathbbmNH+2,
let
[TABLE]
let D:N→D and
R:N→(∪k,l∈\mathbbmNC(\mathbbmRk,\mathbbmRl))
satisfy
for all H∈\mathbbmN, k0,k1,…,kH,kH+1∈\mathbbmN,
Φ=((W1,B1),…,(WH+1,BH+1))∈∏n=1H+1(\mathbbmRkn×kn−1×\mathbbmRkn),x0∈\mathbbmRk0,…,xH∈\mathbbmRkH with
∀n∈\mathbbmN∩[1,H]:xn=Akn(Wnxn−1+Bn)
that
[TABLE]
[TABLE]
*let ⊙:D×D→D satisfy
for all H1,H2∈\mathbbmN, α=(α0,α1,…,αH1,αH1+1)∈\mathbbmNH1+2, β=(β0,β1,…,βH2,βH2+1)∈\mathbbmNH2+2
that
*
[TABLE]
*let ⊞:D×D→D satisfy
for all H∈\mathbbmN,
α=(α0,α1,…,αH,αH+1)∈\mathbbmNH+2,
β=(β0,β1,β2,…,βH,βH+1)∈\mathbbmNH+2
that
*
[TABLE]
and let
nn∈D, n∈[3,∞)∩\mathbbmN, satisfy for all n∈[3,∞)∩\mathbbmN that
[TABLE]
Remark 3.2**.**
The set N can be viewed as the set of all artificial neural networks.
For each network Φ∈N the function
R(Φ) is the function represented by Φ and
the vector D(Φ) describes the layer dimensions of Φ.
3.2 Properties of operations associated to deep neural networks
Lemma 3.3** (⊙ is associative).**
Assume Setting 3.1 and let α,β,γ∈D. Then it holds that
(α⊙β)⊙γ=α⊙(β⊙γ).
Throughout this proof let
H1,H2,H3∈\mathbbmN,
let (αi)i∈[0,H1+1]∩\mathbbmN0∈\mathbbmNH1+2,
(βi)i∈[0,H2+1]∩\mathbbmN0∈\mathbbmNH2+2,
(γi)i∈[0,H3+1]∩\mathbbmN0∈\mathbbmNH3+2 satisfy that
Throughout this proof let αi,βi,γi∈\mathbbmN, i∈[1,H]∩\mathbbmN, satisfy that
α=(k,α1,α2,…,αH,l),
β=(k,β1,β2,…,βH,l), and
γ=(k,γ1,γ2,…,γH,l).
The definition of ⊞ (see (34)) then shows that
Throughout this proof let αi,βi∈\mathbbmN, i∈[1,H]∩\mathbbmN satisfy that
α=(k,α1,α2,…,αH,l)
and
β=(k,β1,β2,…,βH,l).
The definition of ⊞ (see (34)) then shows that
α⊞β=(k,α1+β1,α2+β2,…,αH+βH,l).
This together with the triangle inequality implies that
Throughout this proof
let W1∈\mathbbmR2×1, Wi∈\mathbbmR2×2, i∈[2,H]∩\mathbbmN,
WH+1∈\mathbbmR1×2,
Bi∈\mathbbmR2, i∈[1,H]∩\mathbbmN,
BH+1∈\mathbbmR1 satisfy that
[TABLE]
let
ϕ∈N satisfy that
ϕ=((W1,B1),(W2,B2),…,(WH,BH),(WH+1,BH+1)),
for every a∈\mathbbmR let a+∈[0,∞) be the non-negative part of a, i.e., a+=max{a,0}, and
let
x0∈\mathbbmR, x1,x2,…,xH∈\mathbbmR2 satisfy for all
n∈\mathbbmN∩[1,H] that
[TABLE]
Note that (41) and the definition of D (see (31)) imply that
D(ϕ)=nH+2.
Furthermore, (41), (42), and an induction argument show that
The fact that x0 was arbitrary therefore
proves
that R(ϕ)=Id\mathbbmR. This and the fact that D(ϕ)=nH+2 demonstrate that
Id\mathbbmR∈R({Φ∈N:D(Φ)=nH+2}).
The proof of Lemma 3.6 is thus completed.
∎
Lemma 3.7** (DNNs for affine transformations).**
Assume Setting 3.1 and let d,m∈\mathbbmN,
λ∈\mathbbmR,
b∈\mathbbmRd, a∈\mathbbmRm, Ψ∈N satisfy that R(Ψ)∈C(\mathbbmRd,\mathbbmRm). Then it holds that
This and the fact that
y0 was arbitrary
prove that R(ϕ)=λ((R(Ψ))(⋅+b)+a). This and the fact that
D(ϕ)=D(Ψ) imply that
λ((R(Ψ))(⋅+b)+a)∈R({Φ∈N:D(Φ)=D(Ψ)}). The proof of Lemma 3.7 is thus completed.
∎
Lemma 3.8** (Composition).**
Assume Setting 3.1 and let d1,d2,d3∈\mathbbmN, f∈C(\mathbbmRd2,\mathbbmRd3), g∈C(\mathbbmRd1,\mathbbmRd2),
α,β∈D satisfy that
f∈R({Φ∈N:D(Φ)=α})
and
g∈R({Φ∈N:D(Φ)=β}).
Then it holds
that (f∘g)∈R({Φ∈N:D(Φ)=α⊙β}).
Throughout this proof let H1,H2,α0,…,αH1+1,β0,…,βH2+1∈\mathbbmN, Φf,Φg∈N satisfy that
[TABLE]
Lemma 5.4 in [JSW18] shows that there exists I∈N such that D(I)=d2n3=(d2,2d2,d2)
and R(I)=Id\mathbbmRd2.
Note that 2d2=βH2+1+α0.
This and
[JSW18, Proposition 5.2]
(with ϕ1=Φf, ϕ2=Φg, and I=I in the notation of [JSW18, Proposition 5.2])
show that there exists
Φf∘g∈N such that
R(Φf∘g)=f∘g and D(Φf∘g)=D(Φf)⊙D(Φg)=α⊙β. Hence,
it holds
that f∘g∈R({Φ∈N:D(Φ)=α⊙β}).
The proof of Lemma 3.8 is thus completed.
∎
The following result, Lemma 3.9, essentially generalizes [JSW18, Lemma 5.1] to the case where the DNNs have different hidden layer dimensions.
Lemma 3.9** (Sum of DNNs of the same length).**
Assume Setting 3.1 and let M,H,p,q∈\mathbbmN, h1,h2,…,hM∈\mathbbmR,
ki∈D,
fi∈C(\mathbbmRp,\mathbbmRq),
i∈[1,M]∩\mathbbmN, satisfy
for all i∈[1,M]∩\mathbbmN
that
Throughout this proof
let
ϕi∈N, i∈[1,M]∩\mathbbmN, and
ki,n∈\mathbbmN, i∈[1,M]∩\mathbbmN, n∈[0,H+1]∩\mathbbmN0, satisfy for all
i∈[1,M]∩\mathbbmN that
[TABLE]
for every i∈[1,M]∩\mathbbmN
let ((Wi,1,Bi,1),…,(Wi,H+1,Bi,H+1))∈∏n=1H+1(\mathbbmRki,n×ki,n−1×\mathbbmRki,n)
satisfy that
[TABLE]
let
kn⊞∈\mathbbmN, n∈[1,H]∩\mathbbmN,
k⊞∈\mathbbmNH+2
satisfy for all n∈[1,H]∩\mathbbmN that
[TABLE]
let W1∈\mathbbmRk1⊞×p, B1∈\mathbbmRk1⊞ satisfy that
[TABLE]
let Wn∈\mathbbmRkn⊞×kn−1⊞, Bn∈\mathbbmRkn⊞, n∈[2,H]∩\mathbbmN, satisfy for all n∈[2,H]∩\mathbbmN that
[TABLE]
let WH+1∈\mathbbmRq×kH⊞, BH+1∈\mathbbmRq satisfy that
[TABLE]
let x0∈\mathbbmRp,x1∈\mathbbmRk1⊞,x2∈\mathbbmRk2⊞…,xH∈\mathbbmRkH⊞,
let
x1,0,x2,0,…,xM,0∈\mathbbmRp,
xi,n∈\mathbbmRki,n, i∈[1,M]∩\mathbbmN, n∈[1,H]∩\mathbbmN, satisfy
for all i∈[1,M]∩\mathbbmN, n∈[1,H]∩\mathbbmN
that
[TABLE]
and let ψ∈N satisfy that
[TABLE]
First,
the definitions of D and R (see (31) and 32),
(56), and the fact that ∀i∈[1,M]∩\mathbbmN:fi∈C(\mathbbmRp,\mathbbmRq) show for all i∈[1,M]∩\mathbbmN that ki=(p,ki,1,ki,2,…,ki,H,q).
The definition of D (see (31)),
the definition of ⊞ (see (34)),
and (58) then show that
[TABLE]
Next, we prove by induction on n∈[1,H]∩\mathbbmN that xn=(x1,n,x2,n,…,xM,n).
First, (59) shows that
[TABLE]
This implies that
[TABLE]
This proves the base case. Next, for the induction step let n∈[2,H]∩\mathbbmN and assume that xn−1=(x1,n−1,x2,n−1,…,xM,n−1).
Then (60) and the induction hypothesis ensure that
[TABLE]
This yields that
[TABLE]
This proves the induction step. Induction now proves for all n∈[1,H]∩\mathbbmN that
xn=(x1,n,x2,n,…,xM,n).
This, the definition of R (see (32)), and
(61) imply that
[TABLE]
This, the fact that x0∈\mathbbmRp was arbitrary, and (56) yield that
3.3 Deep neural network representations for MLP approximations
Lemma 3.10**.**
Assume Setting 3.1,
let d,M∈\mathbbmN,
T,c∈(0,∞),
f∈C(\mathbbmR,\mathbbmR),
g∈C(\mathbbmRd,\mathbbmR),
Φf,Φg∈N satisfy that
R(Φf)=f,
R(Φg)=g,
and
[TABLE]
let
(Ω,F,P)
be a probability space,
let
Θ=⋃n∈\mathbbmN\mathbbmZn,
let uθ:Ω→[0,1], θ∈Θ, be independent random variables which are uniformly distributed on [0,1],
let Uθ:[0,T]×Ω→[0,T], θ∈Θ, satisfy
for all t∈[0,T], θ∈Θ that
Utθ=t+(T−t)uθ,
let Wθ:[0,T]×Ω→\mathbbmRd, θ∈Θ, be independent
standard Brownian motions with continuous sample paths,
assume that (uθ)θ∈Θ and (Wθ)θ∈Θ are independent,
let
Un,Mθ:[0,T]×\mathbbmRd×Ω→\mathbbmR, n,M∈\mathbbmZ, θ∈Θ, satisfy
for all n∈\mathbbmN, θ∈Θ,
t∈[0,T], x∈\mathbbmRd
that U−1,Mθ(t,x)=U0,Mθ(t,x)=0 and
[TABLE]
and let ω∈Ω.
Then
for all n∈\mathbbmN0
there exists a family (Φn,tθ)θ∈Θ,t∈[0,T]⊆N such that
We prove Lemma 3.10 by induction on n∈\mathbbmN0.
For the base case n=0 note that
the fact that
∀t∈[0,T],θ∈Θ:U0,Mθ(t,⋅)=0,
the fact that the function [math] can be represented by a network with depth dim(D(Φg)), and (72)
imply that there exists
(Φ0,tθ)θ∈Θ,t∈[0,T]⊆N such that
it holds for all
t1,t2∈[0,T], θ1,θ2∈Θ
that
D(Φ0,t1θ1)=D(Φ0,t2θ2)
and such that
it holds for all θ∈Θ, t∈[0,T] that
dim(D(Φ0,tθ))=dim(D(Φg)), D(Φ0,tθ)≤∣∣∣D(Φg)∣∣∣≤c, and U0,Mθ(t,⋅,ω)=R(Φ0,tθ).
This proves
the base case n=0.
For the induction step from n∈\mathbbmN0 to n+1∈\mathbbmN let n∈\mathbbmN0 and assume
that Item i–Item iv hold true for all k∈[0,n]∩\mathbbmN0. The assumption that
g=R(Φg) and
Lemma 3.7 (with d=d, m=1, λ=1, a=0, b=WTθ(ω)−Wtθ(ω), and Ψ=Φg
for θ∈Θ, t∈[0,T]
in the notation of Lemma 3.7)
show for all θ∈Θ, t∈[0,T] that
[TABLE]
Furthermore, Lemma 3.6 (with
H=(n+1)(dim(D(Φf))−1)−1 in the notation of Lemma 3.6)
ensures that
[TABLE]
This, (78),
and Lemma 3.8 (with d1=d, d2=1, d3=1, f=Id\mathbbmR, g=g\big{(}\cdot+W^{\theta}_{T}(\omega)-W^{\theta}_{t}(\omega)\big{)}, α=n(n+1)(dim(D(Φf))−1)+1, and β=D(Φg)
for θ∈Θ, t∈[0,T]
in the notation of Lemma 3.8) show that
for all θ∈Θ, t∈[0,T] it holds that
[TABLE]
Next, the induction hypothesis implies
for all θ∈Θ, t∈[0,T],
l∈[0,n]∩\mathbbmN0
that
in the notation of Lemma 3.8) prove
for all η,θ∈Θ, t∈[0,T] that
[TABLE]
Furthermore, the definition of ⊙ in (33) and
the fact that
[TABLE]
in
the induction hypothesis imply that
[TABLE]
that
[TABLE]
and for all
l∈[0,n−1]∩\mathbbmN0 that
[TABLE]
This shows, roughly speaking, that the functions in (80),
(90), and (88) can be represented by networks with the same depth
(i.e. number of layers): (n+1)(dim(D(Φf))−1)+dim(D(Φg)). Hence, Lemma 3.9