Simultaneous upper and lower bounds of American-style option prices with hedging via neural networks
Ivan Guo, Nicolas Langren\'e, Jiahao Wu

TL;DR
This paper presents two neural network-based methods to efficiently compute both upper and lower bounds of American-style option prices and derive hedging strategies, avoiding nested simulations and enabling high-dimensional pricing.
Contribution
Introduces two novel neural network approaches for simultaneous upper and lower bound estimation of American options without nested Monte Carlo.
Findings
Reduces computational complexity for high-dimensional options
Provides effective hedging strategies and variance reduction techniques
Demonstrates accurate bounds through numerical experiments
Abstract
In this paper, we introduce two novel methods to solve the American-style option pricing problem and its dual form at the same time using neural networks. Without applying nested Monte Carlo, the first method uses a series of neural networks to simultaneously compute both the lower and upper bounds of the option price, and the second one accomplishes the same goal with one global network. The avoidance of extra simulations and the use of neural networks significantly reduce the computational complexity and allow us to price Bermudan options with frequent exercise opportunities in high dimensions, as illustrated by the provided numerical experiments. As a by-product, these methods also derive a hedging strategy for the option, which can also be used as a control variate for variance reduction.
| Lower Bound | Upper Bound | Difference | |||||
|---|---|---|---|---|---|---|---|
| Time(sec) | Mean | S.D. | Mean | S.D | Mean | S.D | |
| Original | 360 | 4.4738 | 0.0007 | 4.4889 | 0.0005 | 0.0151 | 0.0010 |
| Variation 1 | 135 | 4.4769 | 0.0002 | 4.4877 | 0.0004 | 0.0108 | 0.0005 |
| Training | Lower Bound | Upper Bound | Difference | ||||
| Time(sec) | Mean | S.D. | Mean | S.D | Mean | Optimum | |
| 1 Term | 117 | 4.4754 | 0.0013 | 4.5525 | 0.0008 | 0.0771 | 0.0015 |
| 2 Terms | 117 | 4.4772 | 0.0003 | 4.4876 | 0.0004 | 0.0104 | 0.0004 |
| LB | UB | Diff | ||||||
|---|---|---|---|---|---|---|---|---|
| Variables | Time | Mean | S.D. | Mean | S.D | Mean | S.D | |
| 270150 | 123 | 4.4769 | 0.0002 | 4.4887 | 0.0006 | 0.0117 | 0.0006 | |
| 268650 | 117 | 4.4772 | 0.0003 | 4.4876 | 0.0004 | 0.0104 | 0.0004 | |
| 2186050 | 797 | 26.9205 | 0.0161 | 27.4541 | 0.0742 | 0.5336 | 0.0879 | |
| 2139050 | 784 | 26.9104 | 0.0053 | 27.2567 | 0.0014 | 0.3463 | 0.0047 | |
| Variations | Method | Accuracy | Time | Memory |
| V1: Warm-start training | I | ✓ | ✓ | |
| V2: Train on partial data | II | ✗ | ✓ | |
| V3: Train on fresh data | II | ✓ | ||
| V4: Add a second martingale term | I, II | ✓ | ✗ | |
| V5: Use two separate networks | I, II | ✓ | ✗ | |
| V6: Add sub-steps | I, II | ✓ | ✗ |
| LB | UB | Diff | ||||||
| Variables | Time | Mean | S.D. | Mean | S.D | Mean | S.D | |
| I | 71150 | 99 | 4.4762 | 0.0007 | 4.4887 | 0.0014 | 0.0125 | 0.0014 |
| 268650 | 117 | 4.4772 | 0.0003 | 4.4876 | 0.0004 | 0.0104 | 0.0004 | |
| 591650 | 107 | 4.4769 | 0.0003 | 4.4877 | 0.0003 | 0.0107 | 0.0004 | |
| II | 25953 | 115 | 4.4729 | 0.0029 | 4.4893 | 0.0019 | 0.0164 | 0.0046 |
| 46278 | 108 | 4.4744 | 0.0020 | 4.4889 | 0.0019 | 0.0145 | 0.0035 | |
| 81703 | 140 | 4.4763 | 0.0006 | 4.4879 | 0.0007 | 0.0115 | 0.0010 | |
| LB | UB | Diff | ||||||
| L.R. | Time | Mean | S.D. | Mean | S.D | Mean | S.D | |
| I | 0.015 | 3656 | 26.1380 | 0.0085 | 26.2114 | 0.0045 | 0.0734 | 0.0127 |
| 0.01 | 3943 | 26.1369 | 0.0062 | 26.2064 | 0.0036 | 0.0695 | 0.0094 | |
| 0.005 | 5995 | 26.1405 | 0.0035 | 26.1975 | 0.0014 | 0.0571 | 0.0046 | |
| II | 0.01 | 5860 | 26.1432 | 0.0043 | 26.2299 | 0.0051 | 0.0866 | 0.0076 |
| 0.005 | 6281 | 26.1417 | 0.0070 | 26.2262 | 0.0064 | 0.0846 | 0.0124 | |
| 0.001 | 13832 | 26.1434 | 0.0074 | 26.2205 | 0.0063 | 0.0771 | 0.0134 | |
| LB | UB | Diff | ||||||
| Structure | Time | Mean | S.D. | Mean | S.D | Mean | S.D | |
| I | 1405 | 1.6403 | 0.0012 | 1.6490 | 0.0041 | 0.0088 | 0.0035 | |
| 1264 | 1.6403 | 0.0015 | 1.6482 | 0.0020 | 0.0079 | 0.0017 | ||
| II | 3122 | 1.6402 | 0.0039 | 1.6469 | 0.0016 | 0.0067 | 0.0023 | |
| 2876 | 1.6409 | 0.0029 | 1.6476 | 0.0028 | 0.0067 | 0.0030 | ||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic processes and financial applications · Advanced Control Systems Optimization · Reservoir Engineering and Simulation Methods
Simultaneous upper and lower bounds of American option prices with hedging via neural networks
Ivan Guo Ivan Guo’s work was partially supported by the Australian Research Council (Grant DP220103106) and CSIRO Data61 Risklab. School of Mathematical Sciences, Monash University, Melbourne, Australia
Centre for Quantitative Finance and Investment Strategies, Monash University, Australia
Nicolas Langrené Nicolas Langrené’s work was supported in part by the Guangdong Provincial Key Laboratory of Interdisciplinary Research and Application for Data Science, BNU-HKBU United International College, project code 2022B1212010006, and in part by the UIC Start-up Research Fund UICR0700041-22. BNU-HKBU United International College, Zhuhai, China
Jiahao Wu
School of Mathematical Sciences, Monash University, Melbourne, Australia
Abstract
In this paper, we introduce two methods to solve the American-style option pricing problem and its dual form at the same time using neural networks. Without applying nested Monte Carlo, the first method uses a series of neural networks to simultaneously compute both the lower and upper bounds of the option price, and the second one accomplishes the same goal with one global network. The avoidance of extra simulations and the use of neural networks significantly reduce the computational complexity and allow us to price Bermudan options with frequent exercise opportunities in high dimensions, as illustrated by the provided numerical experiments. As a by-product, these methods also derive a hedging strategy for the option, which can also be used as a control variate for variance reduction.
1 Introduction
Pricing American options is a type of optimal control/stopping problem in which the goal is to find the stopping strategy that maximises the option value. Numerically, there have been many attempts based on classical partial differential equation methods and binomial trees [13, 4, 27, 44, 37, 10]. However, when there are multiple factors impacting the value of the option, these methods become expensive computationally, a limitation known as the curse of dimensionality. To circumvent this difficulty, simulation-based methods have been extensively explored [42, 5, 15, 35, 43, 11, 14, 3, 32, 12, 36], among which Longstaff and Schwartz [35]’s Least Squares Monte Carlo (LSMC) method has gained much popularity. In this method, the dynamic programming principle is applied to determine the optimal stopping strategy recursively by comparing the immediate exercise payoff to the continuation value estimated by least-squares regression on a set of Monte Carlo simulations of the underlying asset price model.
Methods directly solving the pricing problem typically generate a candidate optimal stopping strategy and a lower bound on the price, which is more in the interest of the buying party. On the other hand, option sellers would be more interested in an upper bound on the price, which can be obtained from a super-hedging strategy. Haugh and Kogan [26] and Rogers [40] independently explored the duality of the pricing problem, based on which a variety of methods [2, 29, 9, 39, 41] that derive upper bounds by approximating corresponding martingales have been proposed.
In Longstaff and Schwartz [35]’s original work, a set of basis functions is used to approximate the continuation values via ordinary least-squares regression in search of the stopping strategy. As the dimension of the problem increases, the number of basis functions increases greatly and the method becomes numerically unstable. In our work, we modify the LSMC algorithm by performing the regressions using neural networks (NNs). Kohler et al. [31] and Lapeyre and Lelong [33] studied similar modifications, but they only explored the modelling of stopping strategies. Other works involving deep learning in option pricing include Han et al. [25], Raissi [38], Chen and Wan [18], Germain et al. [23] where they solve the corresponding partial differential equations (PDEs) or backward stochastic differential equations (BSDEs) instead.
The main contribution of our work is the incorporation of the dual formulation of the option price into the modified LSMC method to design algorithms that simultaneously generate both lower and upper bounds of the option price. Becker et al. [7, 8] proposed a similar method to price Bermudan options. The main difference to our proposed method is that they first find a stopping strategy to approximate a lower bound, based on which they then derive an upper bound using nested Monte Carlo. The computational cost of this method can be very high in the case of pricing Bermudan options with frequent exercise opportunities, which would be the case when trying to approximate an American option. Similar methods designed by Lokeshwar et al. [34] do not require nested simulations, but the derivation of a biased upper estimate is separate from the determination of the stopping strategy.
In addition, we propose to use one global network instead of a series of networks in the derivation by including the time as an input variable. A global network has been introduced to solve semi-linear PDEs [17] and other control problems [24, 22], but the target value of their loss function is known when training starts while ours are unavailable initially. This is due to the fact that in such stopping problems, the training targets are generated by future optimal stopping strategies, which are outputs (rather than inputs) of the problem. To overcome this difficulty, we alternate the update of stopping strategies and the training of networks till it produces satisfactory results.
Another advantage of our method is the derivation of hedging strategies as an immediate by-product. Most methods of generating hedging strategies in the literature are either taking the first derivative of the approximated option values [3, 12, 28] or approximating the function that represents the difference of option values at different times once the option has been priced [7, 6]. The efficiency of hedging from these methods depends on the accurate differentiation of the estimated continuation value function. Since functions with similar values can have very different derivatives, even satisfying approximations of the value process can lead to ineffective hedging strategies. In our work, the hedging strategy is directly computed from the dual martingale used in the upper bound estimate instead of the differentiation, and we are able to provide hedging strategies at all times before maturity, not just at the exercise times. Moreover, this can also be used as a control variate to reduce variance, leading to a more accurate lower bound.
This paper is organised in the following order. In Section 2, we explain how we combine the LSMC algorithm with the dual formulation to design our method. Section 3 introduces our algorithms and a number of variations. Section 4 demonstrates numerical results in both low- and high-dimensional settings, followed by some concluding remarks.
2 Problem Formulation
Consider an American option with maturity that can be exercised at any time . Let be a filtered probability space, where is the augmented filtration of a -dimensional Brownian motion , and is the probability measure equivalent to the real-world measure under which all discounted asset prices are martingales.
Define as the value of the risk-free account at , where the constant is the risk-free interest rate. The price of the option is based on risky assets whose value process is Markovian and is the solution to the SDE
[TABLE]
where is assumed to satisfy sufficient regularity conditions to ensure the well-posedness of the equation.
2.1 Lower bound of the option price
Let denote the -adapted right-continuous payoff process of the option satisfying . Let be a stopping time. Let be the set of all stopping times with respect to the filtration . Then, the value of the American option at time is
[TABLE]
and in particular the value at time zero is
[TABLE]
For any specific stopping strategy , we have Hence the estimate of the American option price given by one strategy is a lower bound of the real value.
2.2 Upper bound of the option price
Denote by the set of all uniformly integrable martingales with the initial state [math]. The American option pricing problem has a dual form:
[TABLE]
Since the discounted option value process is a supermartingale of class D [30], it has a unique Doob-Meyer decomposition:
[TABLE]
where , and is a predictable increasing process with . Rogers [40], Haugh and Kogan [26] proved the duality and showed that the infimum is attained at .
Denote as the set of martingales that are both uniformly integrable and square integrable. We restrict our search for within the set , noting that the optimal martingales corresponding to the options we price would lie in this set. Since and is adapted to the Brownian filtration , the Brownian martingale representation theorem states that there exists a predictable process such that , and
[TABLE]
This allows us to estimate the optimal martingale by approximating the process numerically, and then generate an upper bound of the option price.
2.3 Hedging Strategy
Consider a measurable adapted process with values in , where is the number of units of the -th asset held in a portfolio consisting of risky assets and one risk-free asset. The value of the portfolio at time is
[TABLE]
The process satisfies the condition a.s, and it is a self-financing hedging strategy if
[TABLE]
Combining the Doob-Meyer decomposition (2) and the Brownian Martingale representation (3), we obtain
[TABLE]
For the portfolio to super-replicate the option, we need for all . It is well-known that the cheapest such portfolio satisfies and for all . Comparing equations (4) and (5), we see that this can be achieved by setting
[TABLE]
Hence, the hedging strategy can be computed directly from the process . The process is also the difference and can be interpreted as the losses incurred by the buyer if they miss the optimal exercise opportunity.
3 Valuing an American option numerically
From now on, we only approximate American options by Bermudan options whose exercise times are restricted to the discrete set , for , where . By taking the expectation of the discounted option value conditioned on and applying the Doob-Meyer decomposition we have
[TABLE]
In this equation, the conditional expectation is the continuation value, and the integral is the martingale increment. Since the stock price process is Markovian, both the conditional expectation and the process for can be written as functions of the state variables [20, 19].
Let be the payoff of the option at . Let and be approximations of the continuation function and the process at , respectively. The martingale increment can be approximated by , where . We refer to as the martingale increment function.
Consider two random processes and . They will be updated recursively backward and their expected value will be a lower and an upper bound of the option price, respectively. The rule of updating is as follows.
At , the option holder has to either exercise the option if it is in the money or let it expire if it is out of the money, so .
At each time step , the option holder either exercises the option immediately if the payoff value is higher than the continuation value, or waits until the next exercise point if it is lower. The corresponding stopping time is:
[TABLE]
Based on this policy, we define and for as follows:
[TABLE]
[TABLE]
Though both and have the term in the second case, the subtractions have different meaning. In the update of (option price upper bound), it is the martingale increment we need to deduct based on the duality formulation (1). In the update of (option price lower bound), it works as a control variate for variance reduction. If the approximation of is perfect, the variance can be canceled out completely. A proof shows that this control variate indeed reduces the variance of the estimate is given in Appendix Appendix A.
The processes and can also be interpreted in the following way. The variable is a proxy of the buyer’s price, as the two cases correspond to the stopping decision based on comparing the exercise payoff and the continuation value. The variable is a proxy of the seller’s price, as the two cases correspond to whether the seller needs to update their hedging targets based on the comparison of the exercise payoff and the hedging price.
To approximate the continuation value functions and the predictable process , we perform a regression based on equation (6):
[TABLE]
In this work, we use fully-connected feedforward neural networks to perform these regressions, denoted as where describes the structure of a network, for instance, represents a network with layers, and each layer has neurons. In particular, and are the number of inputs and the number of outputs respectively. Each network has the following form
[TABLE]
where , , and is the activation function applied after the affine transformation from layer to layer .
Remark El Karoui et al. [21] showed that pricing American options is related to reflected BSDEs, the solution of which is a -measurable tuple for with values in (, , ), and satisfies:
[TABLE]
Our work can be easily extended to solve this type of BSDE. The processes and here have the same meaning as we have defined before, and our work generates numerical solutions to them. The process can be seen as the non-decreasing process and calculated by a second simulation where we accumulate the gap between the value process and the payoff process. Note we have in our case. However, if we have a model where , we can still approximate it by adding one more term to our regression.
We design two algorithms to apply the method described above. One uses a series of neural networks, and the other one uses only one global network. To avoid any confusion, we refer to the algorithm with multiple networks as method I, and the global one as method II. In addition, to improve the algorithms, a number of variations have been introduced.
3.1 Method I: Multiple Neural Networks
In this method, one neural network is used to regress the continuation value and the martingale increment on the current stock prices at time . Note that although we perform a regression at , we do not make exercise decisions at the initial time. The training of the networks at each time stops once some predetermined stopping criteria are met, which can be a given number of epochs or the stagnation of the validation set loss. The whole process is summarized in Algorithm 1.
During the training, all trained models at each time are saved, then we perform an independent out-of-sample simulation to derive estimates. There are two ways to carry out the second simulation. One is in the same way as in the training where we determine the values backward. Alternatively, we can start from the initial time, making decisions forward. This allows us to only focus on the paths that are still in the money at each time and help us overcome the memory exhaustion problem as we only need to generate the path one step at a time instead of the whole path.
3.2 Method II: One Global Neural Network
After pricing a vanilla American option that has exercise points using method I, we plot and , for shown below (1). We can see that the shapes of the continuation functions and the process appear to evolve consistently and continuously in time.
Based on this observation, we propose a second method where we only use one network for all regressions by including the time as an input variable. However, this approach poses additional challenges as it requires target values at all times when we start training the model. In method I, the update of before the regression provides a relatively accurate target values for the training of the corresponding network, but it is not available in method II. To overcome this challenge, we alternate the model training and stopping strategy updates. Initially we set the stopping time to be equal to the maturity, so target values at become . We train the model for a given number of epochs, and then use the trained model to determine a new series of in the same way as in method I. Once all target values are updated, we train the model again. This training-updating process repeated until some predefined criterion is met. A small number of epochs are carried out between each stopping time update, since the stopping strategies we applied may not be optimal.
Denote by and the approximations of the continuation functions and the martingale increment functions. Method II is summarized in Algorithm 2.
3.3 Algorithm Variations
Aside from the variability of the simulations, there are two other sources of errors in our method. One is the time discretisation error caused by approximating the continuous martingale by a discrete-time process. This error is proportional to the step size which can lead to unsatisfactory upper bounds if the option has infrequent exercise opportunities. The other source is the regression, which can be reduced by using a larger data set and training for longer time, but these will lead to higher computational costs and memory requirements. To improve our algorithms, we design six different variations, aiming at generating more accurate results, reducing computational cost and overcoming the memory exhaustion problem.
Variation 1: Warm-start training with the network trained one step before
In the original version of method I, we randomly initialise the weights and biases of a network at each time when the regression starts. Since the shapes of both continuation functions and martingale increment functions at different times have similarities as shown in Figure 1, the parameters of the corresponding networks should resemble. Therefore, we can use the parameters of the previously trained network as the initial ones of the model we are about to train. Table 1 and figure 2 demonstrate the change in results with random and non-random initialisation, from which we can see this variation helps us save time, and may also offer better results.
Variation 2: Train on data from parts of the exercise times
Originally in method II, we train the model using all given paths at all times. We propose a modified version where we only train on a portion of the data. The choice of training samples can be either random or based on an equally spaced grid. Assume we have exercise points, and we want to train the model using only the data from half of the exercise times. The first way is to randomly choose numbers from , and only using the samples from the chosen times in the training. On the other hand, we can choose points from one step and skip the next one, then we end up with data from . We plot the changes in training time and differences between bounds with an increasing number of times used to train in Figure 3. We can see this modification reduces the computational cost, but also sacrifices the accuracy of the results. This is not unexpected and our aim is to find a balance between them.
Variation 3: Generate fresh data while training
Since we have to simulate the whole path before the training, the memory requirement can be extremely high, especially for high-dimensional problems. One way that has been used in [16, 1] is to store the random seed used in the simulation instead of the whole path, then recover values at different times based on the seed when needed. This can be applied to method I, but values of the whole path are needed in method II. Hence, we propose a procedure to overcome the memory exhaustion problem that can be applied to it.
In the beginning, we only generate the validation set which is used to check the performance of the model after each update of the stopping strategy. During updates, a number of batches are generated, which are used to train the network for a given number of epochs, and then are deleted. Since the size of the batch is significantly smaller than the whole dataset needed for training, this enables training on a larger number of paths without experiencing the aforementioned memory exhaustion problem.
Figure 4 shows the changes in the difference between the lower and the upper bounds of the option price with training progressing when we use various numbers of batches between updates of the stopping strategy. We can see that there is a slight difference in the speed of convergence when different numbers of batches are used, but there is no definite conclusion on the best number of bathes as there are several hyperparameters that have influences on this. We can also see that the results may start deteriorating after a period of stagnation. Moreover, we plot these changes using the original version of method II, method I with variation 1 and compare them with the case where we generate new batches between updates. We can see that the original method II dominates the other two at the early stage, but the results from all three methods converge eventually.
Variation 4: Add a second term for Martingale increments approximation
We have mentioned earlier that one source of errors is the time discretisation. To improve the results caused by this, we can add one more term in the regression to better explain the martingale increments. The choice of the term depends on the model, and we choose in our work. With this choice, the summation of two martingale terms can be connected to the Milstein scheme. The changes in results caused by variation 4 applied to method I are shown in Table 2. We can see that with the same network structure and training time, adding variation 4 in the algorithm significantly reduces the gap between the lower and the upper bound, and the improvement is mainly contributed by the better approximation to the upper bound. The lower bound also improves slightly, and this is due to a better variance reduction.
Variation 5: Use separate networks for the two functions
In both methods, we have been using the same network to approximate and . In other words, we have been using one neural network with multiple outputs. However, martingale increment functions and continuation value functions may have very different complexities, especially in more complicated models with higher dimensions. In these cases, we can use separate networks to approximate these two functions instead. Table 3 shows the test results of this variation. We can see that with a similar number of free parameters, variation 5 can produce more accurate results without sacrificing the training time required.
Variation 6: Add sub-steps
Another way to improve the martingale approximation caused by the time discretisation is to add substeps between consecutive exercise times, where we do not make stopping decisions but accumulate martingale increments. Figure 5 shows the change of the bounds with an increasing number of substeps using both methods, from which we can see that introducing substeps does improve our estimation of the upper bound. The improvement appears to slow down as the number of substeps increases.
Discussion
We summarise the contributions each variation can bring to our methods in Table 4. The second column indicates to which method one variation can be applied. There are three aspects one variation can contribute to: the accuracy of the estimates, the training time, and the computational memory required. We use ✓ and ✗ to indicate an improvement and a deterioration respectively. If there is no obvious change due to the variation, we leave it blank. From the table we can see that Variation 1 can improve both accuracy and training time needed, and Variation 3 can help us overcome the memory exhaustion problem without sacrificing the other two aspects. Variation 2 improves the computational speed at the expense of accuracy, while the opposite is true for the last three variations. All variations that can be applied to one method can be used at the same time to combine their effects.
4 Numerical Results
This section illustrates the numerical results generated by both methods we propose. Unless otherwise stated, Variation 1, 4, 5 and 6 are applied to method I, and variation 4, 5 and 6 are applied to method II. We consider options with 1 or 5 underlying assets respectively, whose prices follow either a geometric Brownian Motion or a Heston model. All trainings use ADAM as the optimiser, mean square error as the loss function and Relu as activation functions. During the training, we cross-validate to lower the chance of over-fitting. In method I, at each time we stop training the corresponding network once the loss of the validation set ceases to decrease for epochs. In method II, the stopping criterion is that the validation set loss stagnates for more than updates of the stopping time, and we train epoch among updates. The training was performed on an NVIDIA Tesla P100 GPU under the system Xeon-E5-2680-v4 with 64GB memory. The program is written in Python 3.8.5 using TensorFlow 2.4.1.
In each subsection, we demonstrate statistics (means and standard deviations of the lower bound, the upper bound and their difference) for one type of options using two methods by repeating the process times, and plot a histogram to show hedging errors. We plot both the total hedging errors and the worst hedging errors in one graph. Let be the stopping time for path . The error for that path at is defined as
[TABLE]
and the worst error is defined as
[TABLE]
4.1 Options under Black-Scholes Models
Consider American options with underlying assets, whose prices follow the dynamics
[TABLE]
where the risk-free interest rate , the dividend rate and the volatility .
4.1.1 1D American Put Option
We first price a 1D vanilla American put option with maturity and a strike price whose underlying asset has no dividend paid and the initial price , and . The payoff function at is . We discretise the time interval into sub-intervals, and use paths to price this option. The best result from our trials is a lower bound of 4.4775 and an upper bound of 4.4873 with a difference of 0.0098, generated by Method I. Other statistics from the pricing process and hedging results are shown in Table 5 and Figure 6. From the table we can see that in both methods, the neural network with a bigger number of neurons tends to generate better result with a slightly longer training process. Method II has the potential to outperform method I if the number of free variables reach to a similar level as shown in the last row. However, this can encounter a practical problem as a large amount of free variables requires a significantly larger network which can exhaust computing resource of the GPU.
4.1.2 High Dimensional Bermudan max-call option
Consider an option with underlying assets under the Black-Scholes setting. We assume there is no correlation between Brownian Motions and , , on which each stock price is based. All stocks have the same initial price , the dividend rate and the volatility . There are equally spaced exercise points before and including the maturity . The risk neutral interest rate is . The payoff of this option is
[TABLE]
This option has a large step size , which poses the challenge for our methods. The change in the results with respect to the number of substeps have been shown in the previous section. Table 6 shows the pricing results with 32 sub-steps using different network structures. The optimal result from our trials is a lower bound of 26.1433 and an upper bound of 26.1954 with a difference of 0.0521, generated by Method I. The number of free variables in method I and II are 151974 and 83311 respectively, and this difference contributed in the outperformance of method I. As mentioned before, method II has the potential to compete with method I, but the limitation on the size of the network restricts our trials. In addition, it does take longer for method II to complete the training. Figure 7 shows the hedging error by method II with variation 5 and 32 sub-steps.
4.2 American Put Option under Heston Model
Finally, we test our methods under the Heston model, where the volatility itself is also stochastic:
[TABLE]
The particular option we price is the same as the one in Lapeyre and Lelong [33] priced with a strike price and reaches maturity in steps. The Heston model has the following parameters: risk-free interest rate , long-term average standard deviation , the variance process reverts to at the rate , and the volatility of volatility is . The two Brownian motion and have a correlation . The initial stock price and volatility are and respectively.
Since there are two Brownian motions involved in this scenario, we will have
[TABLE]
as our martingale increment. Similar to the max-call option in section , the step size is big, we use substeps for the implementation (variation 6). Figure 8 shows the change in the estimates with an increasing number of substeps. We can see that both lower and upper bounds decrease. This is not only because the martingale approximation improves with a decreasing step size, but also because the Heston model simulation becomes more accurate. Table 7 shows the results from both methods with substeps using different network structures.
5 Conclusion
We have designed two methods that use artificial neural networks to simultaneously compute both lower and upper bounds of an American option price. Both methods determine the stopping strategy by comparing the immediate exercise payoff and the continuation value. The first method uses a series of networks to approximate the continuation values and martingale increments at each exercise time. The second method applies one global network by adding time as an additional input, and alternates between network training and stopping strategy updates until a stopping criterion is met. From the results shown in Section 4 we can see that they both work efficiently while the second method offers more flexibility. One advantage of our methods is that nested simulations are avoided, which is a significant computational improvement when pricing American/Bermudan options that have frequent exercise opportunities. Moreover, our method offers the hedging strategy as a by-product without extra simulations and calculations, which can also be used for the variance reduction. Though most numerical results shown in this paper are based on the Geometric Brownian Motion, the only restriction to apply our methods is the Markovian property of the underlying asset price model. This allows us to extend our methods to more complicated models. Our work can also be extended to solve reflected backward stochastic differential equations.
Appendix A
Recall the stopping time is defined as , and Since is measurable, by the martingale representation:
[TABLE]
By taking the expectation of conditioned first on and then on , we have
[TABLE]
[TABLE]
Combining (7) and (8), we obtain
[TABLE]
Based on Itô isometry and the above results, we can see the variance of the payoff at the stopping time is
[TABLE]
Hence, and are uncorrelated, and
[TABLE]
Therefore, adding the control variate in the derivation of can reduce the variance.
Note that we also show that . This is in line with the hedging theory as we stop hedging once the stopping time is reached (the option is exercised).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Aïd et al. [2014] R. Aïd, L. Campi, N. Langrené, and H. Pham. A probabilistic numerical method for optimal multiple switching problems in high dimension. SIAM Journal on Financial Mathematics , 5(1):191–231, 2014.
- 2Andersen and Broadie [2004] L. Andersen and M. Broadie. Primal-dual simulation algorithm for pricing multidimensional American options. Management Science , 50(9):1222–1234, 2004.
- 3Bally et al. [2005] V. Bally, G. Pagès, and J. Printems. A quantization tree method for pricing and hedging multidimensional American options. Mathematical Finance: An International Journal of Mathematics, Statistics and Financial Economics , 15(1):119–168, 2005.
- 4Barone-Adesi and Whaley [1987] G. Barone-Adesi and R. E. Whaley. Efficient analytic approximation of American option values. The Journal of Finance , 42(2):301–320, 1987.
- 5Barraquand and Martineau [1995] J. Barraquand and D. Martineau. Numerical valuation of high dimensional multivariate American securities. Journal of Financial and Quantitative Analysis , 30(3):383–405, 1995.
- 6Beck et al. [2022] C. Beck, M. Hutzenthaler, A. Jentzen, and B. Kuckuck. An overview on deep learning-based approximation methods for partial differential equations. Discrete and Continuous Dynamical Systems - Series B , 2022.
- 7Becker et al. [2019] S. Becker, P. Cheridito, and A. Jentzen. Deep optimal stopping. Journal of Machine Learning Research , 20(74):1–25, 2019.
- 8Becker et al. [2020] S. Becker, P. Cheridito, and A. Jentzen. Pricing and hedging American-style options with deep learning. Journal of Risk and Financial Management , 13(7):158, 2020.
