Simultaneous upper and lower bounds of American-style option prices with   hedging via neural networks

Ivan Guo; Nicolas Langren\'e; Jiahao Wu

arXiv:2302.12439·q-fin.CP·April 22, 2025

Simultaneous upper and lower bounds of American-style option prices with hedging via neural networks

Ivan Guo, Nicolas Langren\'e, Jiahao Wu

PDF

Open Access 1 Repo

TL;DR

This paper presents two neural network-based methods to efficiently compute both upper and lower bounds of American-style option prices and derive hedging strategies, avoiding nested simulations and enabling high-dimensional pricing.

Contribution

Introduces two novel neural network approaches for simultaneous upper and lower bound estimation of American options without nested Monte Carlo.

Findings

01

Reduces computational complexity for high-dimensional options

02

Provides effective hedging strategies and variance reduction techniques

03

Demonstrates accurate bounds through numerical experiments

Abstract

In this paper, we introduce two novel methods to solve the American-style option pricing problem and its dual form at the same time using neural networks. Without applying nested Monte Carlo, the first method uses a series of neural networks to simultaneously compute both the lower and upper bounds of the option price, and the second one accomplishes the same goal with one global network. The avoidance of extra simulations and the use of neural networks significantly reduce the computational complexity and allow us to price Bermudan options with frequent exercise opportunities in high dimensions, as illustrated by the provided numerical experiments. As a by-product, these methods also derive a hedging strategy for the option, which can also be used as a control variate for variance reduction.

Tables7

Table 1. Table 1: Pricing 1D vanilla American put option (with the same parameters as the one in Section 4.1.1). The first row displays results where weights are randomly initialised at each time. The second row shows the estimate when we apply the variation 1.

		Lower Bound		Upper Bound		Difference
	Time(sec)	Mean	S.D.	Mean	S.D	Mean	S.D
Original	360	4.4738	0.0007	4.4889	0.0005	0.0151	0.0010
Variation 1	135	4.4769	0.0002	4.4877	0.0004	0.0108	0.0005

Table 2. Table 2: 1D American Put Option Pricing with/without Second Martingale terms, applied in method I. The network used in both cases have the same structure: [(50, 50), (50, 30, 30)]

	Training	Lower Bound		Upper Bound		Difference
	Time(sec)	Mean	S.D.	Mean	S.D	Mean	Optimum
1 Term	117	4.4754	0.0013	4.5525	0.0008	0.0771	0.0015
2 Terms	117	4.4772	0.0003	4.4876	0.0004	0.0104	0.0004

Table 3. Table 3: 1D and 10D American put options pricing with/without applying variation 5. The subscripts I and I5 in column one indicate that the method used are the original method I and method I with variation 5 respectively. The superscript shows the dimension of the problem. The second column is the total number of free variables trained.

			LB		UB		Diff
	Variables	Time	Mean	S.D.	Mean	S.D	Mean	S.D
$Θ_{I}^{1 D}$	270150	123	4.4769	0.0002	4.4887	0.0006	0.0117	0.0006
$Θ_{I5}^{1 D}$	268650	117	4.4772	0.0003	4.4876	0.0004	0.0104	0.0004
$Θ_{I}^{10 D}$	2186050	797	26.9205	0.0161	27.4541	0.0742	0.5336	0.0879
$Θ_{I5}^{10 D}$	2139050	784	26.9104	0.0053	27.2567	0.0014	0.3463	0.0047

Table 4. Table 4: Algorithm Variations

Variations	Method	Accuracy	Time	Memory
V1: Warm-start training	I	✓	✓
V2: Train on partial data	II	✗	✓
V3: Train on fresh data	II			✓
V4: Add a second martingale term	I, II	✓	✗
V5: Use two separate networks	I, II	✓	✗
V6: Add sub-steps	I, II	✓	✗

Table 5. Table 5: 1D American put option pricing. The first column indicates the method used and the second column shows the total number of free variables involved in the training process. The networks used in method I has the structure [(20,20),(20,20,20)],[(50, 50),(50, 30, 30)],[(75,65),(75, 50,50)]. The networks used in method II all have 5 layers, and the numbers of neurons in each layer are the same. From structure 1 to 3, each has 50, 75 and 100 neurons respectively.

			LB		UB		Diff
	Variables	Time	Mean	S.D.	Mean	S.D	Mean	S.D
I	71150	99	4.4762	0.0007	4.4887	0.0014	0.0125	0.0014
	268650	117	4.4772	0.0003	4.4876	0.0004	0.0104	0.0004
	591650	107	4.4769	0.0003	4.4877	0.0003	0.0107	0.0004
II	25953	115	4.4729	0.0029	4.4893	0.0019	0.0164	0.0046
	46278	108	4.4744	0.0020	4.4889	0.0019	0.0145	0.0035
	81703	140	4.4763	0.0006	4.4879	0.0007	0.0115	0.0010

Table 6. Table 6: 5D Bermudan max-call option pricing. All networks used in method I have two layers for continuation value approximations and three layers for the martingale increments. The number of neurons is [(75, 50), (100, 75, 50)]. In method II, there are 5 hidden layers with 75 neurons for both function approximations. The second column indicates the initial learning rates used.

			LB		UB		Diff
	L.R.	Time	Mean	S.D.	Mean	S.D	Mean	S.D
I	0.015	3656	26.1380	0.0085	26.2114	0.0045	0.0734	0.0127
	0.01	3943	26.1369	0.0062	26.2064	0.0036	0.0695	0.0094
	0.005	5995	26.1405	0.0035	26.1975	0.0014	0.0571	0.0046
II	0.01	5860	26.1432	0.0043	26.2299	0.0051	0.0866	0.0076
	0.005	6281	26.1417	0.0070	26.2262	0.0064	0.0846	0.0124
	0.001	13832	26.1434	0.0074	26.2205	0.0063	0.0771	0.0134

Table 7. Table 7: 1D American put option pricing under Heston model. The second column indicates the network structure. All networks used in both method I and method II have two layers for continuation value approximations and three layers for the martingale increments. The number of neurons from structure 1-3 is [(100, 50), (100, 50, 25)] and [(100, 100), (100, 100, 100)].

			LB		UB		Diff
	Structure	Time	Mean	S.D.	Mean	S.D	Mean	S.D
I	$Θ_{I}^{1}$	1405	1.6403	0.0012	1.6490	0.0041	0.0088	0.0035
I	$Θ_{I}^{2}$	1264	1.6403	0.0015	1.6482	0.0020	0.0079	0.0017
II	$Θ_{I I}^{1}$	3122	1.6402	0.0039	1.6469	0.0016	0.0067	0.0023
	$Θ_{I I}^{2}$	2876	1.6409	0.0029	1.6476	0.0028	0.0067	0.0030

Equations61

d S_{t} = r S_{t} d t + σ (t, S_{t}) d W_{t},

d S_{t} = r S_{t} d t + σ (t, S_{t}) d W_{t},

V_{t}=\operatorname*{ess\,sup}_{\tau\in\mathcal{T},\tau\geq t}\mathbb{E}\!\left[\frac{\beta_{t}Z_{\tau}}{\beta_{\tau}}\Big{|}\mathcal{F}_{t}\right],

V_{t}=\operatorname*{ess\,sup}_{\tau\in\mathcal{T},\tau\geq t}\mathbb{E}\!\left[\frac{\beta_{t}Z_{\tau}}{\beta_{\tau}}\Big{|}\mathcal{F}_{t}\right],

V_{0} = τ \in T sup E [\frac{Z _{τ}}{β _{τ}}] .

V_{0} = τ \in T sup E [\frac{Z _{τ}}{β _{τ}}] .

V_{0} = M \in M^{U I} in f E [t \in [0, T] sup \frac{Z _{t}}{β _{t}} - M_{t}] .

V_{0} = M \in M^{U I} in f E [t \in [0, T] sup \frac{Z _{t}}{β _{t}} - M_{t}] .

\frac{V _{t}}{β _{t}} = V_{0} + M_{t}^{*} - A_{t}^{*},

\frac{V _{t}}{β _{t}} = V_{0} + M_{t}^{*} - A_{t}^{*},

M_{t}^{*} = M_{0}^{*} + \int_{0}^{t} H_{s} d W_{s} .

M_{t}^{*} = M_{0}^{*} + \int_{0}^{t} H_{s} d W_{s} .

U_{t} = J_{t}^{0} β_{t} + i = 1 \sum d J_{t}^{i} S_{t}^{i} .

U_{t} = J_{t}^{0} β_{t} + i = 1 \sum d J_{t}^{i} S_{t}^{i} .

\frac{U _{t}}{β _{t}} = U_{0} + \int_{0}^{t} J_{u} β_{u}^{- 1} σ (u, S_{u}) d W_{u} .

\frac{U _{t}}{β _{t}} = U_{0} + \int_{0}^{t} J_{u} β_{u}^{- 1} σ (u, S_{u}) d W_{u} .

\frac{V _{t}}{β _{t}} = V_{0} + \int_{0}^{t} H_{u} d W_{u} - A_{t}^{*}

\frac{V _{t}}{β _{t}} = V_{0} + \int_{0}^{t} H_{u} d W_{u} - A_{t}^{*}

J_{t} = \frac{β _{t} H _{t}}{σ ( t , S _{t} )} .

J_{t} = \frac{β _{t} H _{t}}{σ ( t , S _{t} )} .

\frac{V_{t_{i+1}}}{\beta_{t_{i+1}}}=\mathbb{E}\!\left[\frac{V_{t_{i+1}}}{\beta_{t_{i+1}}}\Big{|}\mathcal{F}_{t_{i}}\right]+\int_{t_{i}}^{t_{i+1}}H_{u}dW_{u}.

\frac{V_{t_{i+1}}}{\beta_{t_{i+1}}}=\mathbb{E}\!\left[\frac{V_{t_{i+1}}}{\beta_{t_{i+1}}}\Big{|}\mathcal{F}_{t_{i}}\right]+\int_{t_{i}}^{t_{i+1}}H_{u}dW_{u}.

τ_{i} = min {t_{j} \in {t_{i}, t_{i + 1} ..., t_{n - 1}} : f (S_{t_{j}}) \geq Φ (S_{t_{j}})} \land t_{n} .

τ_{i} = min {t_{j} \in {t_{i}, t_{i + 1} ..., t_{n - 1}} : f (S_{t_{j}}) \geq Φ (S_{t_{j}})} \land t_{n} .

Y_{t_{i}} = {f (S_{t_{i}}), β_{Δ t}^{- 1} Y_{t_{i + 1}} - Ψ (S_{t_{i}}) \cdot Δ W_{t_{i}}, if f (S_{t_{i}}) \geq Φ (S_{t_{i}}), if f (S_{t_{i}}) < Φ (S_{t_{i}}) .

Y_{t_{i}} = {f (S_{t_{i}}), β_{Δ t}^{- 1} Y_{t_{i + 1}} - Ψ (S_{t_{i}}) \cdot Δ W_{t_{i}}, if f (S_{t_{i}}) \geq Φ (S_{t_{i}}), if f (S_{t_{i}}) < Φ (S_{t_{i}}) .

X_{t_{i}} = {f (S_{t_{i}}), β_{Δ t}^{- 1} X_{t_{i + 1}} - Ψ (S_{t_{i}}) \cdot Δ W_{t_{i}}, if f (S_{t_{i}}) \geq X_{t_{i + 1}} - Ψ (S_{t_{i}}) \cdot Δ W_{t_{i}}, if f (S_{t_{i}}) < X_{t_{i + 1}} - Ψ (S_{t_{i}}) \cdot Δ W_{t_{i}} .

X_{t_{i}} = {f (S_{t_{i}}), β_{Δ t}^{- 1} X_{t_{i + 1}} - Ψ (S_{t_{i}}) \cdot Δ W_{t_{i}}, if f (S_{t_{i}}) \geq X_{t_{i + 1}} - Ψ (S_{t_{i}}) \cdot Δ W_{t_{i}}, if f (S_{t_{i}}) < X_{t_{i + 1}} - Ψ (S_{t_{i}}) \cdot Δ W_{t_{i}} .

Φ, Ψ min (β_{Δ_{t}}^{- 1} Y_{t_{i + 1}} - Φ (S_{t_{i}}) - Ψ (S_{t_{i}}) Δ W_{t_{i}})^{2} .

Φ, Ψ min (β_{Δ_{t}}^{- 1} Y_{t_{i + 1}} - Φ (S_{t_{i}}) - Ψ (S_{t_{i}}) Δ W_{t_{i}})^{2} .

A_{L} \circ σ_{n_{L - 1}} \circ A_{L - 1} \circ \dots \circ σ_{1} \circ A_{1},

A_{L} \circ σ_{n_{L - 1}} \circ A_{L - 1} \circ \dots \circ σ_{1} \circ A_{1},

⎩ ⎨ ⎧ V_{t} = Z_{T} - \int_{0}^{t} b (s, V_{s}, π_{s}) d s + K_{T} - K_{t} - \int_{t}^{T} H_{s} d W_{s}, V_{t} \geq Z_{t}, 0 \leq t \leq T, K_{0} = 0, and \int_{0}^{T} (V_{t} - Z_{t}) d K_{t} = 0. .

⎩ ⎨ ⎧ V_{t} = Z_{T} - \int_{0}^{t} b (s, V_{s}, π_{s}) d s + K_{T} - K_{t} - \int_{t}^{T} H_{s} d W_{s}, V_{t} \geq Z_{t}, 0 \leq t \leq T, K_{0} = 0, and \int_{0}^{T} (V_{t} - Z_{t}) d K_{t} = 0. .

ϵ_{1}^{i} = V_{0} + t = t_{0} \sum τ^{i} - Δ t β_{t}^{- 1} H (S_{t}^{i}) Δ W_{t}^{i} - β_{τ^{i}}^{- 1} Z_{τ^{i}}^{i},

ϵ_{1}^{i} = V_{0} + t = t_{0} \sum τ^{i} - Δ t β_{t}^{- 1} H (S_{t}^{i}) Δ W_{t}^{i} - β_{τ^{i}}^{- 1} Z_{τ^{i}}^{i},

ϵ_{2}^{i} = V_{0} + t^{i} \in {t_{1}, ..., t_{n}} min t = t_{0} \sum t^{i} - Δ t β_{t}^{- 1} H (S_{t}^{i}) Δ W_{t}^{i} - β_{t^{i}}^{- 1} Z_{t^{i}}^{i} .

ϵ_{2}^{i} = V_{0} + t^{i} \in {t_{1}, ..., t_{n}} min t = t_{0} \sum t^{i} - Δ t β_{t}^{- 1} H (S_{t}^{i}) Δ W_{t}^{i} - β_{t^{i}}^{- 1} Z_{t^{i}}^{i} .

d S_{t}^{i} = (r - δ^{i}) S_{t}^{i} d t + σ S_{t}^{i} d W_{t}^{i}, for i \in {1, 2, ..., d}

d S_{t}^{i} = (r - δ^{i}) S_{t}^{i} d t + σ S_{t}^{i} d W_{t}^{i}, for i \in {1, 2, ..., d}

(i \in {1, 2, ..., d} max S_{t}^{i} - K)^{+} .

(i \in {1, 2, ..., d} max S_{t}^{i} - K)^{+} .

{d S_{t} = r S_{t} d t + V_{t} S_{t} d W_{S} d V_{t} = λ (σ^{2} - V_{t}) d t + ξ V_{t} d W_{V} .

{d S_{t} = r S_{t} d t + V_{t} S_{t} d W_{S} d V_{t} = λ (σ^{2} - V_{t}) d t + ξ V_{t} d W_{V} .

Ψ^{S} (S_{t_{i}}) Δ W_{t_{i}}^{S} + Ψ^{V} (S_{t_{i}}) Δ W_{t_{i}}^{V}

Ψ^{S} (S_{t_{i}}) Δ W_{t_{i}}^{S} + Ψ^{V} (S_{t_{i}}) Δ W_{t_{i}}^{V}

Z_{τ_{i}} = E [Z_{τ_{i}}] + \int_{0}^{τ_{i}} H_{s} d W_{s} + \int_{τ_{i}}^{T} H_{s} d W_{s} .

Z_{τ_{i}} = E [Z_{τ_{i}}] + \int_{0}^{τ_{i}} H_{s} d W_{s} + \int_{τ_{i}}^{T} H_{s} d W_{s} .

Z_{τ_{i}} = E [Z_{τ_{i}} ∣ F_{τ_{i}}] = E [Z_{τ_{i}}] + \int_{0}^{τ_{i}} H_{s} d W_{s},

Z_{τ_{i}} = E [Z_{τ_{i}} ∣ F_{τ_{i}}] = E [Z_{τ_{i}}] + \int_{0}^{τ_{i}} H_{s} d W_{s},

E [Z_{τ_{i}} ∣ F_{t_{i}}] = E [Z_{τ_{i}}] + \int_{0}^{t_{i}} H_{s} d W_{s} .

E [Z_{τ_{i}} ∣ F_{t_{i}}] = E [Z_{τ_{i}}] + \int_{0}^{t_{i}} H_{s} d W_{s} .

Z_{τ_{i}} = E [Z_{τ_{i}} ∣ F_{t_{i}}] + \int_{t_{i}}^{τ_{i}} H_{s} d W_{s},

Z_{τ_{i}} = E [Z_{τ_{i}} ∣ F_{t_{i}}] + \int_{t_{i}}^{τ_{i}} H_{s} d W_{s},

Var (Z_{τ_{i}})

Var (Z_{τ_{i}})

= E [\int_{0}^{t_{i}} H_{s}^{2} d s] + E [\int_{t_{i}}^{τ_{i}} H_{s}^{2} d s]

= Var (E [Z_{τ_{i}} ∣ F_{t_{i}}]) + Var (\int_{t_{i}}^{τ_{i}} H_{s} d W_{s}) .

Var (Y_{t_{i}})

Var (Y_{t_{i}})

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jiahaowu27/american-option-pricing
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic processes and financial applications · Advanced Control Systems Optimization · Reservoir Engineering and Simulation Methods

Full text

Simultaneous upper and lower bounds of American option prices with hedging via neural networks

Ivan Guo Ivan Guo’s work was partially supported by the Australian Research Council (Grant DP220103106) and CSIRO Data61 Risklab. School of Mathematical Sciences, Monash University, Melbourne, Australia

Centre for Quantitative Finance and Investment Strategies, Monash University, Australia

Nicolas Langrené Nicolas Langrené’s work was supported in part by the Guangdong Provincial Key Laboratory of Interdisciplinary Research and Application for Data Science, BNU-HKBU United International College, project code 2022B1212010006, and in part by the UIC Start-up Research Fund UICR0700041-22. BNU-HKBU United International College, Zhuhai, China

Jiahao Wu

School of Mathematical Sciences, Monash University, Melbourne, Australia

Abstract

In this paper, we introduce two methods to solve the American-style option pricing problem and its dual form at the same time using neural networks. Without applying nested Monte Carlo, the first method uses a series of neural networks to simultaneously compute both the lower and upper bounds of the option price, and the second one accomplishes the same goal with one global network. The avoidance of extra simulations and the use of neural networks significantly reduce the computational complexity and allow us to price Bermudan options with frequent exercise opportunities in high dimensions, as illustrated by the provided numerical experiments. As a by-product, these methods also derive a hedging strategy for the option, which can also be used as a control variate for variance reduction.

1 Introduction

Pricing American options is a type of optimal control/stopping problem in which the goal is to find the stopping strategy that maximises the option value. Numerically, there have been many attempts based on classical partial differential equation methods and binomial trees [13, 4, 27, 44, 37, 10]. However, when there are multiple factors impacting the value of the option, these methods become expensive computationally, a limitation known as the curse of dimensionality. To circumvent this difficulty, simulation-based methods have been extensively explored [42, 5, 15, 35, 43, 11, 14, 3, 32, 12, 36], among which Longstaff and Schwartz [35]’s Least Squares Monte Carlo (LSMC) method has gained much popularity. In this method, the dynamic programming principle is applied to determine the optimal stopping strategy recursively by comparing the immediate exercise payoff to the continuation value estimated by least-squares regression on a set of Monte Carlo simulations of the underlying asset price model.

Methods directly solving the pricing problem typically generate a candidate optimal stopping strategy and a lower bound on the price, which is more in the interest of the buying party. On the other hand, option sellers would be more interested in an upper bound on the price, which can be obtained from a super-hedging strategy. Haugh and Kogan [26] and Rogers [40] independently explored the duality of the pricing problem, based on which a variety of methods [2, 29, 9, 39, 41] that derive upper bounds by approximating corresponding martingales have been proposed.

In Longstaff and Schwartz [35]’s original work, a set of basis functions is used to approximate the continuation values via ordinary least-squares regression in search of the stopping strategy. As the dimension of the problem increases, the number of basis functions increases greatly and the method becomes numerically unstable. In our work, we modify the LSMC algorithm by performing the regressions using neural networks (NNs). Kohler et al. [31] and Lapeyre and Lelong [33] studied similar modifications, but they only explored the modelling of stopping strategies. Other works involving deep learning in option pricing include Han et al. [25], Raissi [38], Chen and Wan [18], Germain et al. [23] where they solve the corresponding partial differential equations (PDEs) or backward stochastic differential equations (BSDEs) instead.

The main contribution of our work is the incorporation of the dual formulation of the option price into the modified LSMC method to design algorithms that simultaneously generate both lower and upper bounds of the option price. Becker et al. [7, 8] proposed a similar method to price Bermudan options. The main difference to our proposed method is that they first find a stopping strategy to approximate a lower bound, based on which they then derive an upper bound using nested Monte Carlo. The computational cost of this method can be very high in the case of pricing Bermudan options with frequent exercise opportunities, which would be the case when trying to approximate an American option. Similar methods designed by Lokeshwar et al. [34] do not require nested simulations, but the derivation of a biased upper estimate is separate from the determination of the stopping strategy.

In addition, we propose to use one global network instead of a series of networks in the derivation by including the time as an input variable. A global network has been introduced to solve semi-linear PDEs [17] and other control problems [24, 22], but the target value of their loss function is known when training starts while ours are unavailable initially. This is due to the fact that in such stopping problems, the training targets are generated by future optimal stopping strategies, which are outputs (rather than inputs) of the problem. To overcome this difficulty, we alternate the update of stopping strategies and the training of networks till it produces satisfactory results.

Another advantage of our method is the derivation of hedging strategies as an immediate by-product. Most methods of generating hedging strategies in the literature are either taking the first derivative of the approximated option values [3, 12, 28] or approximating the function that represents the difference of option values at different times once the option has been priced [7, 6]. The efficiency of hedging from these methods depends on the accurate differentiation of the estimated continuation value function. Since functions with similar values can have very different derivatives, even satisfying approximations of the value process can lead to ineffective hedging strategies. In our work, the hedging strategy is directly computed from the dual martingale used in the upper bound estimate instead of the differentiation, and we are able to provide hedging strategies at all times before maturity, not just at the exercise times. Moreover, this can also be used as a control variate to reduce variance, leading to a more accurate lower bound.

This paper is organised in the following order. In Section 2, we explain how we combine the LSMC algorithm with the dual formulation to design our method. Section 3 introduces our algorithms and a number of variations. Section 4 demonstrates numerical results in both low- and high-dimensional settings, followed by some concluding remarks.

2 Problem Formulation

Consider an American option with maturity $T>0$ that can be exercised at any time $t\in(0,T]$ . Let $(\Omega,\mathcal{F},\mathbb{F}=(\mathcal{F}_{t})_{t\in[0,T]},\mathbb{Q})$ be a filtered probability space, where $\mathbb{F}$ is the augmented filtration of a $d$ -dimensional Brownian motion $(W_{t})_{t\in[0,T]}$ , and $\mathbb{Q}$ is the probability measure equivalent to the real-world measure under which all discounted asset prices are martingales.

Define $\beta_{t}=e^{rt}$ as the value of the risk-free account at $t\in[0,T]$ , where the constant $r$ is the risk-free interest rate. The price of the option is based on $d$ risky assets whose value process $(S_{t})_{t\in[0,T]}$ is Markovian and is the solution to the SDE

[TABLE]

where $\sigma:[0,T]\times\mathbb{R}^{d}\to\mathbb{R}^{d\times d}$ is assumed to satisfy sufficient regularity conditions to ensure the well-posedness of the equation.

2.1 Lower bound of the option price

Let $(Z_{t})_{t\in[0,T]}$ denote the $\mathbb{F}$ -adapted right-continuous payoff process of the option satisfying $\mathbb{E}[\sup_{t\in[0,T]}Z_{t}]<\infty$ . Let $\tau:\Omega\to(0,T]$ be a stopping time. Let $\mathcal{T}$ be the set of all stopping times with respect to the filtration $\mathbb{F}$ . Then, the value of the American option at time $t$ is

[TABLE]

and in particular the value at time zero is

[TABLE]

For any specific stopping strategy $\tau^{*}\in\mathcal{T}$ , we have $V_{0}^{*}=\sup_{\tau\in\mathcal{T}}\mathbb{E}[\frac{Z_{\tau}}{\beta_{\tau}}]\geq\mathbb{E}[\frac{Z_{\tau^{*}}}{\beta_{\tau^{*}}}].$ Hence the estimate of the American option price given by one strategy is a lower bound of the real value.

2.2 Upper bound of the option price

Denote by $\mathcal{M}^{UI}$ the set of all uniformly integrable martingales with the initial state [math]. The American option pricing problem has a dual form:

[TABLE]

Since the discounted option value process $(\frac{V_{t}}{\beta_{t}})_{t\in[0,T]}$ is a supermartingale of class D [30], it has a unique Doob-Meyer decomposition:

[TABLE]

where $(M^{*}_{t})_{t\in[0,T]}\in\mathcal{M}^{UI}$ , and $(A^{*}_{t})_{t\in[0,T]}$ is a predictable increasing process with $A^{*}_{0}=0$ . Rogers [40], Haugh and Kogan [26] proved the duality and showed that the infimum is attained at $M=M^{*}$ .

Denote $\mathcal{M}\subset\mathcal{M}^{UI}$ as the set of martingales that are both uniformly integrable and square integrable. We restrict our search for $M^{*}$ within the set $\mathcal{M}$ , noting that the optimal martingales corresponding to the options we price would lie in this set. Since $M^{*}\in\mathcal{M}$ and is adapted to the Brownian filtration $\mathbb{F}$ , the Brownian martingale representation theorem states that there exists a predictable process $H$ such that $\mathbb{E}\int_{0}^{T}H^{2}_{s}ds<\infty$ , and

[TABLE]

This allows us to estimate the optimal martingale $M^{*}$ by approximating the process $H$ numerically, and then generate an upper bound of the option price.

2.3 Hedging Strategy

Consider a measurable adapted process $(J_{t})_{t\in[0,T]}$ with values in $\mathbb{R}^{d+1}$ , where $J^{i}$ is the number of units of the $i$ -th asset held in a portfolio consisting of $d$ risky assets and one risk-free asset. The value of the portfolio at time $t$ is

[TABLE]

The process satisfies the condition $\int_{0}^{T}|J_{u}|^{2}dW_{u}=\sum_{i=1}^{d}\int_{0}^{T}|J_{u}^{i}|^{2}du<\infty$ a.s, and it is a self-financing hedging strategy if

[TABLE]

Combining the Doob-Meyer decomposition (2) and the Brownian Martingale representation (3), we obtain

[TABLE]

For the portfolio to super-replicate the option, we need $U_{t}\geq Z_{t}$ for all $t\in[0,T]$ . It is well-known that the cheapest such portfolio satisfies $U_{0}=V_{0}$ and $U_{t}\geq V_{t}\geq Z_{t}$ for all $t\in[0,T]$ . Comparing equations (4) and (5), we see that this can be achieved by setting

[TABLE]

Hence, the hedging strategy $J$ can be computed directly from the process $H$ . The process $A^{*}_{t}$ is also the difference $U_{t}-V_{t}$ and can be interpreted as the losses incurred by the buyer if they miss the optimal exercise opportunity.

3 Valuing an American option numerically

From now on, we only approximate American options by Bermudan options whose exercise times are restricted to the discrete set $t_{i}=i\cdot\Delta t$ , for $i\in\{1,...,n\}$ , where $\Delta t=\frac{T}{n}$ . By taking the expectation of the discounted option value conditioned on $\mathcal{F}_{t_{i}}$ and applying the Doob-Meyer decomposition we have

[TABLE]

In this equation, the conditional expectation is the continuation value, and the integral is the martingale increment. Since the stock price process is Markovian, both the conditional expectation and the process $(H_{t_{i}})^{n}_{i=0}$ for $i\in\{0,1,...,n\}$ can be written as functions of the state variables $S_{t_{i}}$ [20, 19].

Let $f(S_{t}):\mathbb{R}^{d}\to\mathbb{R}$ be the payoff of the option at $t\in(0,T]$ . Let $\Phi(S_{t_{i}}):\mathbb{R}^{d}\to\mathbb{R}$ and $\Psi(S_{t_{i}}):\mathbb{R}^{d}\to\mathbb{R}^{d}$ be approximations of the continuation function and the process $H_{t_{i}}$ at $t_{i}$ , respectively. The martingale increment can be approximated by $\Psi(S_{t_{i}})\cdot\Delta W_{{t_{i}}}$ , where $\Delta W_{{t_{i}}}=W_{{t_{i+1}}}-W_{{t_{i}}}$ . We refer to $\Psi(S_{t_{i}})$ as the martingale increment function.

Consider two random processes $(Y_{t_{i}})^{n}_{i=1}$ and $(X_{t_{i}})^{n}_{i=1}$ . They will be updated recursively backward and their expected value will be a lower and an upper bound of the option price, respectively. The rule of updating is as follows.

At $t_{n}=T$ , the option holder has to either exercise the option if it is in the money or let it expire if it is out of the money, so $Y_{t_{n}}=X_{t_{n}}=\max(f(S_{t_{n}}),0)$ .

At each time step $t_{1}\leq t<t_{n}$ , the option holder either exercises the option immediately if the payoff value is higher than the continuation value, or waits until the next exercise point if it is lower. The corresponding stopping time is:

[TABLE]

Based on this policy, we define $Y_{t_{i}}$ and $X_{t_{i}}$ for $i\in\{1,2,...,n-1\}$ as follows:

[TABLE]

Though both $X_{t_{i}}$ and $Y_{t_{i}}$ have the term $\Psi(S_{t_{i}})\cdot\Delta W_{t_{i}}$ in the second case, the subtractions have different meaning. In the update of $X_{t_{i}}$ (option price upper bound), it is the martingale increment we need to deduct based on the duality formulation (1). In the update of $Y_{t_{i}}$ (option price lower bound), it works as a control variate for variance reduction. If the approximation of $H_{t_{i}}$ is perfect, the variance can be canceled out completely. A proof shows that this control variate indeed reduces the variance of the estimate is given in Appendix Appendix A.

The processes $X_{t_{i}}$ and $Y_{t_{i}}$ can also be interpreted in the following way. The variable $Y_{t_{i}}$ is a proxy of the buyer’s price, as the two cases correspond to the stopping decision based on comparing the exercise payoff and the continuation value. The variable $X_{t_{i}}$ is a proxy of the seller’s price, as the two cases correspond to whether the seller needs to update their hedging targets based on the comparison of the exercise payoff and the hedging price.

To approximate the continuation value functions $\Phi$ and the predictable process $\Psi$ , we perform a regression based on equation (6):

[TABLE]

In this work, we use fully-connected feedforward neural networks to perform these regressions, denoted as $NN^{\Theta}$ where $\Theta$ describes the structure of a network, for instance, $\Theta=(L,\mathbf{n})$ represents a network with $L$ layers, and each layer $l$ has $\mathbf{n}_{l}$ neurons. In particular, $\mathbf{n}_{1}$ and $\mathbf{n}_{L}$ are the number of inputs and the number of outputs respectively. Each network has the following form

[TABLE]

where $A_{l}(x)=\omega_{l}^{T}x+\beta_{l},\text{ for }x\in\mathbb{R}^{n_{l}}$ , $\omega_{l}\in\mathbb{R}^{n_{l}\times n_{l+1}}$ , $\beta_{l}\in\mathbb{R}^{l+1}$ and $\sigma_{l}$ is the activation function applied after the affine transformation $A_{l}$ from layer $l$ to layer $l+1$ .

Remark El Karoui et al. [21] showed that pricing American options is related to reflected BSDEs, the solution of which is a $\mathcal{F}_{t}$ -measurable tuple $(V_{t},H_{t},K_{t})$ for $t\in[0,T]$ with values in ( $\mathbb{R}$ , $\mathbb{R}^{n}$ , $\mathbb{R}_{+}$ ), and satisfies:

[TABLE]

Our work can be easily extended to solve this type of BSDE. The processes $V$ and $H$ here have the same meaning as we have defined before, and our work generates numerical solutions to them. The process $K$ can be seen as the non-decreasing process $A$ and calculated by a second simulation where we accumulate the gap between the value process and the payoff process. Note we have $b(\cdot,\cdot,\cdot)=0$ in our case. However, if we have a model where $b(\cdot,\cdot,\cdot)\neq 0$ , we can still approximate it by adding one more term to our regression.

We design two algorithms to apply the method described above. One uses a series of neural networks, and the other one uses only one global network. To avoid any confusion, we refer to the algorithm with multiple networks as method I, and the global one as method II. In addition, to improve the algorithms, a number of variations have been introduced.

3.1 Method I: Multiple Neural Networks

In this method, one neural network is used to regress the continuation value and the martingale increment on the current stock prices at time $t_{i}\in\{t_{0},t_{1},t_{2},...,t_{n-1}\}$ . Note that although we perform a regression at $t_{0}$ , we do not make exercise decisions at the initial time. The training of the networks at each time stops once some predetermined stopping criteria are met, which can be a given number of epochs or the stagnation of the validation set loss. The whole process is summarized in Algorithm 1.

During the training, all trained models at each time are saved, then we perform an independent out-of-sample simulation to derive estimates. There are two ways to carry out the second simulation. One is in the same way as in the training where we determine the values backward. Alternatively, we can start from the initial time, making decisions forward. This allows us to only focus on the paths that are still in the money at each time and help us overcome the memory exhaustion problem as we only need to generate the path one step at a time instead of the whole path.

3.2 Method II: One Global Neural Network

After pricing a vanilla American option that has $50$ exercise points using method I, we plot $\Phi(S_{t_{i}})$ and $\Psi(S_{t_{i}})$ , for $i\in\{0,1,...,49\}$ shown below (1). We can see that the shapes of the continuation functions and the process $H$ appear to evolve consistently and continuously in time.

Based on this observation, we propose a second method where we only use one network for all regressions by including the time $t_{i}$ as an input variable. However, this approach poses additional challenges as it requires target values at all times when we start training the model. In method I, the update of $Y_{t_{i}}$ before the regression provides a relatively accurate target values for the training of the corresponding network, but it is not available in method II. To overcome this challenge, we alternate the model training and stopping strategy updates. Initially we set the stopping time to be equal to the maturity, so target values at $t_{i}\in\{t_{0},t_{1},...,t_{n-1}\}$ become $\beta_{(n-i)\Delta t}^{-1}f(t_{n})$ . We train the model for a given number of epochs, and then use the trained model to determine a new series of $Y_{t_{i}}$ in the same way as in method I. Once all target values are updated, we train the model again. This training-updating process repeated until some predefined criterion is met. A small number of epochs are carried out between each stopping time update, since the stopping strategies we applied may not be optimal.

Denote by $\Phi_{\text{II}}(t_{i},S_{t_{i}}):\mathbb{R}_{+}\times\mathbb{R}^{d}\to\mathbb{R}$ and $\Psi_{\text{II}}(t_{i},S_{t_{i}}):\mathbb{R}_{+}\times\mathbb{R}^{d}\to\mathbb{R}^{d}$ the approximations of the continuation functions and the martingale increment functions. Method II is summarized in Algorithm 2.

3.3 Algorithm Variations

Aside from the variability of the simulations, there are two other sources of errors in our method. One is the time discretisation error caused by approximating the continuous martingale by a discrete-time process. This error is proportional to the step size $\sqrt{\Delta t}$ which can lead to unsatisfactory upper bounds if the option has infrequent exercise opportunities. The other source is the regression, which can be reduced by using a larger data set and training for longer time, but these will lead to higher computational costs and memory requirements. To improve our algorithms, we design six different variations, aiming at generating more accurate results, reducing computational cost and overcoming the memory exhaustion problem.

Variation 1: Warm-start training with the network trained one step before

In the original version of method I, we randomly initialise the weights and biases of a network at each time when the regression starts. Since the shapes of both continuation functions and martingale increment functions at different times have similarities as shown in Figure 1, the parameters of the corresponding networks should resemble. Therefore, we can use the parameters of the previously trained network as the initial ones of the model we are about to train. Table 1 and figure 2 demonstrate the change in results with random and non-random initialisation, from which we can see this variation helps us save time, and may also offer better results.

Variation 2: Train on data from parts of the exercise times

Originally in method II, we train the model using all given paths at all times. We propose a modified version where we only train on a portion of the data. The choice of training samples can be either random or based on an equally spaced grid. Assume we have $n=50$ exercise points, and we want to train the model using only the data from half of the exercise times. The first way is to randomly choose $25$ numbers from $\{0,1,2,...,49\}$ , and only using the samples from the chosen times in the training. On the other hand, we can choose points from one step and skip the next one, then we end up with data from $t=t_{1},t_{3},...,t_{49}$ . We plot the changes in training time and differences between bounds with an increasing number of times used to train in Figure 3. We can see this modification reduces the computational cost, but also sacrifices the accuracy of the results. This is not unexpected and our aim is to find a balance between them.

Variation 3: Generate fresh data while training

Since we have to simulate the whole path before the training, the memory requirement can be extremely high, especially for high-dimensional problems. One way that has been used in [16, 1] is to store the random seed used in the simulation instead of the whole path, then recover values at different times based on the seed when needed. This can be applied to method I, but values of the whole path are needed in method II. Hence, we propose a procedure to overcome the memory exhaustion problem that can be applied to it.

In the beginning, we only generate the validation set which is used to check the performance of the model after each update of the stopping strategy. During updates, a number of batches are generated, which are used to train the network for a given number of epochs, and then are deleted. Since the size of the batch is significantly smaller than the whole dataset needed for training, this enables training on a larger number of paths without experiencing the aforementioned memory exhaustion problem.

Figure 4 shows the changes in the difference between the lower and the upper bounds of the option price with training progressing when we use various numbers of batches between updates of the stopping strategy. We can see that there is a slight difference in the speed of convergence when different numbers of batches are used, but there is no definite conclusion on the best number of bathes as there are several hyperparameters that have influences on this. We can also see that the results may start deteriorating after a period of stagnation. Moreover, we plot these changes using the original version of method II, method I with variation 1 and compare them with the case where we generate $5$ new batches between updates. We can see that the original method II dominates the other two at the early stage, but the results from all three methods converge eventually.

Variation 4: Add a second term for Martingale increments approximation

We have mentioned earlier that one source of errors is the time discretisation. To improve the results caused by this, we can add one more term in the regression to better explain the martingale increments. The choice of the term depends on the model, and we choose $\Psi_{2}(S_{t_{i}})(\Delta W_{t_{i}}^{2}-\Delta t)$ in our work. With this choice, the summation of two martingale terms can be connected to the Milstein scheme. The changes in results caused by variation 4 applied to method I are shown in Table 2. We can see that with the same network structure and training time, adding variation 4 in the algorithm significantly reduces the gap between the lower and the upper bound, and the improvement is mainly contributed by the better approximation to the upper bound. The lower bound also improves slightly, and this is due to a better variance reduction.

Variation 5: Use separate networks for the two functions

In both methods, we have been using the same network to approximate $\Phi(\cdot)$ and $\Psi(\cdot)$ . In other words, we have been using one neural network with multiple outputs. However, martingale increment functions and continuation value functions may have very different complexities, especially in more complicated models with higher dimensions. In these cases, we can use separate networks to approximate these two functions instead. Table 3 shows the test results of this variation. We can see that with a similar number of free parameters, variation 5 can produce more accurate results without sacrificing the training time required.

Variation 6: Add sub-steps

Another way to improve the martingale approximation caused by the time discretisation is to add substeps between consecutive exercise times, where we do not make stopping decisions but accumulate martingale increments. Figure 5 shows the change of the bounds with an increasing number of substeps using both methods, from which we can see that introducing substeps does improve our estimation of the upper bound. The improvement appears to slow down as the number of substeps increases.

Discussion

We summarise the contributions each variation can bring to our methods in Table 4. The second column indicates to which method one variation can be applied. There are three aspects one variation can contribute to: the accuracy of the estimates, the training time, and the computational memory required. We use ✓ and ✗ to indicate an improvement and a deterioration respectively. If there is no obvious change due to the variation, we leave it blank. From the table we can see that Variation 1 can improve both accuracy and training time needed, and Variation 3 can help us overcome the memory exhaustion problem without sacrificing the other two aspects. Variation 2 improves the computational speed at the expense of accuracy, while the opposite is true for the last three variations. All variations that can be applied to one method can be used at the same time to combine their effects.

4 Numerical Results

This section illustrates the numerical results generated by both methods we propose. Unless otherwise stated, Variation 1, 4, 5 and 6 are applied to method I, and variation 4, 5 and 6 are applied to method II. We consider options with 1 or 5 underlying assets respectively, whose prices follow either a geometric Brownian Motion or a Heston model. All trainings use ADAM as the optimiser, mean square error as the loss function and Relu as activation functions. During the training, we cross-validate to lower the chance of over-fitting. In method I, at each time we stop training the corresponding network once the loss of the validation set ceases to decrease for $5$ epochs. In method II, the stopping criterion is that the validation set loss stagnates for more than $8$ updates of the stopping time, and we train $1$ epoch among updates. The training was performed on an NVIDIA Tesla P100 GPU under the system Xeon-E5-2680-v4 with 64GB memory. The program is written in Python 3.8.5 using TensorFlow 2.4.1.

In each subsection, we demonstrate statistics (means and standard deviations of the lower bound, the upper bound and their difference) for one type of options using two methods by repeating the process $10$ times, and plot a histogram to show hedging errors. We plot both the total hedging errors $\epsilon_{1}$ and the worst hedging errors $\epsilon_{2}$ in one graph. Let $\tau^{i}$ be the stopping time for path $i$ . The error for that path at $\tau^{i}$ is defined as

[TABLE]

and the worst error is defined as

[TABLE]

4.1 Options under Black-Scholes Models

Consider American options with $d$ underlying assets, whose prices follow the dynamics

[TABLE]

where the risk-free interest rate $r\in\mathbb{R}$ , the dividend rate $\delta^{i}\in\mathbb{R}$ and the volatility $\sigma^{i}\in\mathbb{R}^{+}$ .

4.1.1 1D American Put Option

We first price a 1D vanilla American put option with maturity $T=1$ and a strike price $K=40$ whose underlying asset has no dividend paid and the initial price $S_{0}=36$ , $r=0.06$ and $\sigma=0.2$ . The payoff function at $t$ is $f(S_{t},K)=(K-S_{t})^{+}$ . We discretise the time interval $[0,T]$ into $50$ sub-intervals, and use $10^{5}$ paths to price this option. The best result from our trials is a lower bound of 4.4775 and an upper bound of 4.4873 with a difference of 0.0098, generated by Method I. Other statistics from the pricing process and hedging results are shown in Table 5 and Figure 6. From the table we can see that in both methods, the neural network with a bigger number of neurons tends to generate better result with a slightly longer training process. Method II has the potential to outperform method I if the number of free variables reach to a similar level as shown in the last row. However, this can encounter a practical problem as a large amount of free variables requires a significantly larger network which can exhaust computing resource of the GPU.

4.1.2 High Dimensional Bermudan max-call option

Consider an option with $d$ underlying assets under the Black-Scholes setting. We assume there is no correlation between Brownian Motions $W^{i}$ and $W^{j}$ , $i,j\in\{1,2,...,d\}$ , on which each stock price is based. All stocks have the same initial price $S_{0}^{i}=36$ , the dividend rate $\delta^{i}=0.1$ and the volatility $\sigma=0.2$ . There are $n=9$ equally spaced exercise points before and including the maturity $T=3$ . The risk neutral interest rate is $r=0.05$ . The payoff of this option is

[TABLE]

This option has a large step size ${\Delta}t=\frac{1}{3}$ , which poses the challenge for our methods. The change in the results with respect to the number of substeps have been shown in the previous section. Table 6 shows the pricing results with 32 sub-steps using different network structures. The optimal result from our trials is a lower bound of 26.1433 and an upper bound of 26.1954 with a difference of 0.0521, generated by Method I. The number of free variables in method I and II are 151974 and 83311 respectively, and this difference contributed in the outperformance of method I. As mentioned before, method II has the potential to compete with method I, but the limitation on the size of the network restricts our trials. In addition, it does take longer for method II to complete the training. Figure 7 shows the hedging error by method II with variation 5 and 32 sub-steps.

4.2 American Put Option under Heston Model

Finally, we test our methods under the Heston model, where the volatility itself is also stochastic:

[TABLE]

The particular option we price is the same as the one in Lapeyre and Lelong [33] priced with a strike price $K=100$ and reaches maturity $T=1$ in $10$ steps. The Heston model has the following parameters: risk-free interest rate $r=0.1$ , long-term average standard deviation $\sigma=0.1$ , the variance process $V_{t}$ reverts to $\sigma^{2}$ at the rate $\lambda=2$ , and the volatility of volatility is $\xi=0.2$ . The two Brownian motion $W_{S}$ and $W_{V}$ have a correlation $\rho=-0.3$ . The initial stock price and volatility are $S_{0}=100$ and $V_{0}=0.01$ respectively.

Since there are two Brownian motions involved in this scenario, we will have

[TABLE]

as our martingale increment. Similar to the max-call option in section $4.1.2$ , the step size $\Delta t=0.1$ is big, we use substeps for the implementation (variation 6). Figure 8 shows the change in the estimates with an increasing number of substeps. We can see that both lower and upper bounds decrease. This is not only because the martingale approximation improves with a decreasing step size, but also because the Heston model simulation becomes more accurate. Table 7 shows the results from both methods with $15$ substeps using different network structures.

5 Conclusion

We have designed two methods that use artificial neural networks to simultaneously compute both lower and upper bounds of an American option price. Both methods determine the stopping strategy by comparing the immediate exercise payoff and the continuation value. The first method uses a series of networks to approximate the continuation values and martingale increments at each exercise time. The second method applies one global network by adding time as an additional input, and alternates between network training and stopping strategy updates until a stopping criterion is met. From the results shown in Section 4 we can see that they both work efficiently while the second method offers more flexibility. One advantage of our methods is that nested simulations are avoided, which is a significant computational improvement when pricing American/Bermudan options that have frequent exercise opportunities. Moreover, our method offers the hedging strategy as a by-product without extra simulations and calculations, which can also be used for the variance reduction. Though most numerical results shown in this paper are based on the Geometric Brownian Motion, the only restriction to apply our methods is the Markovian property of the underlying asset price model. This allows us to extend our methods to more complicated models. Our work can also be extended to solve reflected backward stochastic differential equations.

Appendix A

Recall the stopping time is defined as $\tau_{i}=\min\{t_{j}\in\{t_{i},t_{i+1}...,t_{n-1}\}:Z_{t_{j}}\geq\Phi(S_{t_{j}})\}\wedge t_{n}$ , and $Y_{t_{i}}=Z_{\tau_{i}}$ Since $Z_{\tau_{i}}$ is $\mathcal{F}_{t_{n}}$ measurable, by the martingale representation:

[TABLE]

By taking the expectation of $Z_{\tau_{i}}$ conditioned first on $\mathcal{F}_{\tau_{i}}$ and then on $\mathcal{F}_{t_{i}}$ , we have

[TABLE]

Combining (7) and (8), we obtain

[TABLE]

Based on Itô isometry and the above results, we can see the variance of the payoff at the stopping time is

[TABLE]

Hence, $\int_{0}^{t_{i}}H_{s}dW_{s}$ and $\int_{t_{i}}^{\tau_{i}}H_{s}dW_{s}$ are uncorrelated, and

[TABLE]

Therefore, adding the control variate $\int_{t_{i}}^{\tau_{i}}H_{s}dW_{s}$ in the derivation of $Y_{t_{i}}$ can reduce the variance.

Note that we also show that $\int_{\tau_{i}}^{T}H_{s}dW_{s}=0$ . This is in line with the hedging theory as we stop hedging once the stopping time is reached (the option is exercised).

Bibliography44

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Aïd et al. [2014] R. Aïd, L. Campi, N. Langrené, and H. Pham. A probabilistic numerical method for optimal multiple switching problems in high dimension. SIAM Journal on Financial Mathematics , 5(1):191–231, 2014.
2Andersen and Broadie [2004] L. Andersen and M. Broadie. Primal-dual simulation algorithm for pricing multidimensional American options. Management Science , 50(9):1222–1234, 2004.
3Bally et al. [2005] V. Bally, G. Pagès, and J. Printems. A quantization tree method for pricing and hedging multidimensional American options. Mathematical Finance: An International Journal of Mathematics, Statistics and Financial Economics , 15(1):119–168, 2005.
4Barone-Adesi and Whaley [1987] G. Barone-Adesi and R. E. Whaley. Efficient analytic approximation of American option values. The Journal of Finance , 42(2):301–320, 1987.
5Barraquand and Martineau [1995] J. Barraquand and D. Martineau. Numerical valuation of high dimensional multivariate American securities. Journal of Financial and Quantitative Analysis , 30(3):383–405, 1995.
6Beck et al. [2022] C. Beck, M. Hutzenthaler, A. Jentzen, and B. Kuckuck. An overview on deep learning-based approximation methods for partial differential equations. Discrete and Continuous Dynamical Systems - Series B , 2022.
7Becker et al. [2019] S. Becker, P. Cheridito, and A. Jentzen. Deep optimal stopping. Journal of Machine Learning Research , 20(74):1–25, 2019.
8Becker et al. [2020] S. Becker, P. Cheridito, and A. Jentzen. Pricing and hedging American-style options with deep learning. Journal of Risk and Financial Management , 13(7):158, 2020.