Pricing path-dependent Bermudan options using Wiener chaos expansion: an embarrassingly parallel approach
J\'er\^ome Lelong (DAO)

TL;DR
This paper introduces a parallelizable algorithm for pricing complex Bermudan options using Wiener chaos expansion, overcoming non-Markovian challenges and improving computational efficiency over traditional regression methods.
Contribution
It develops a novel policy iteration method that replaces least squares with Wiener chaos expansion, enabling parallel computation for non-Markovian Bermudan option pricing.
Findings
Allows non-Markovian option pricing
Enables embarrassingly parallel computation
Improves efficiency over traditional methods
Abstract
In this work, we propose a new policy iteration algorithm for pricing Bermudan options when the payoff process cannot be written as a function of a lifted Markov process. Our approach is based on a modification of the well-known Longstaff Schwartz algorithm, in which we basically replace the standard least square regression by a Wiener chaos expansion. Not only does it allow us to deal with a non Markovian setting, but it also breaks the bottleneck induced by the least square regression as the coefficients of the chaos expansion are given by scalar products on the L^2 space and can therefore be approximated by independent Monte Carlo computations. This key feature enables us to provide an embarrassingly parallel algorithm.
| p | M | Price | Variance | Reference price | |
|---|---|---|---|---|---|
| 0.2 | 2 | 1E5 | 10.48 | 7E-4 | 10.4795 |
| 0.2 | 2 | 1E6 | 10.47 | 7E-5 | |
| 0.2 | 3 | 1E5 | 10.48 | 6E-4 | |
| 0.2 | 3 | 1E6 | 10.47 | 6E-5 | |
| 0.25 | 2 | 1E5 | 11.96 | 1E-3 | 11.987 |
| 0.25 | 2 | 1E6 | 11.94 | 2E-4 | |
| 0.25 | 3 | 1E5 | 11.96 | 9E-4 | |
| 0.25 | 3 | 1E6 | 11.96 | 1E-4 |
| T | K | N | p | M | Price | Variance | LS | Dual price |
|---|---|---|---|---|---|---|---|---|
| 3 | 100 | 20 | 2 | 5E4 | 4.01793 | 0.00039 | 4.07 | 4.3 |
| 3 | 100 | 20 | 2 | 1E5 | 4.00769 | 0.00028 | ||
| 3 | 100 | 20 | 2 | 1E6 | 3.99801 | 2.15E-05 | ||
| 3 | 100 | 20 | 3 | 5E4 | 4.2544 | 0.00041 | ||
| 3 | 100 | 20 | 3 | 1E5 | 4.1965 | 0.00024 | ||
| 3 | 100 | 20 | 3 | 1E6 | 4.06587 | 2.19E-05 | ||
| 3 | 90 | 20 | 2 | 5E4 | 1.29423 | 0.00013 | 1.32 | 1.47 |
| 3 | 90 | 20 | 2 | 1E5 | 1.27274 | 0.00011 | ||
| 3 | 90 | 20 | 2 | 1E6 | 1.25166 | 2.242E-05 | ||
| 3 | 90 | 20 | 3 | 5E4 | 1.52426 | 8.84E-05 | ||
| 3 | 90 | 20 | 3 | 1E5 | 1.49847 | 0.00010 | ||
| 3 | 90 | 20 | 3 | 1E6 | 1.31845 | 2.72E-05 |
| T | K | N | p | M | Price | Variance | HW |
|---|---|---|---|---|---|---|---|
| 1 | 45 | 20 | 2 | 1E6 | 8.55 | 1E-4 | 8.55 |
| 1 | 45 | 20 | 3 | 1E6 | 8.47 | 1E-4 | |
| 1 | 45 | 20 | 3 | 1E7 | 8.61 | 3E-6 | |
| 1 | 50 | 20 | 2 | 1E6 | 4.81 | 1E-4 | 4.89 |
| 1 | 50 | 20 | 3 | 1E6 | 4.7 | 1E-4 | |
| 1 | 50 | 20 | 3 | 1E7 | 4.79 | 4E-6 | |
| 2 | 45 | 20 | 2 | 1E6 | 10.63 | 2E-4 | 10.62 |
| 2 | 45 | 20 | 3 | 1E6 | 10.46 | 2E-4 | |
| 2 | 45 | 20 | 3 | 1E7 | 10.66 | 6E-6 | |
| 2 | 50 | 20 | 2 | 1E6 | 7.28 | 2E-4 | 7.33 |
| 2 | 50 | 20 | 3 | 1E6 | 7.24 | 2E-4 | |
| 2 | 50 | 20 | 3 | 1E7 | 7.29 | 7E-6 |
| p | M | Price | Variance | LS-Price | Dual price | |
|---|---|---|---|---|---|---|
| 0.02 | 2 | 1E5 | 3.53118 | 8.97E-06 | 3.531 | 3.76 |
| 0.02 | 2 | 1E6 | 3.53863 | 9.7E-07 | ||
| 0.02 | 3 | 1E5 | 3.45177 | 7.05E-06 | ||
| 0.02 | 3 | 1E6 | 3.52758 | 7.12E-07 | ||
| 0.04 | 2 | 1E5 | 4.30318 | 1.7E-04 | 4.268 | 4.52 |
| 0.04 | 2 | 1E6 | 4.31781 | 8.82E-07 | ||
| 0.04 | 3 | 1E5 | 4.18467 | 1.31E-04 | ||
| 0.04 | 3 | 1E6 | 4.30239 | 1.10E-06 |
| p | M | Price | Variance |
|---|---|---|---|
| 2 | 5E4 | 6.62011 | 7.5E-4 |
| 2 | 1E5 | 6.67733 | 2.5E-4 |
| 2 | 1E6 | 6.74565 | 2.00E-05 |
| 3 | 5E4 | 6.28484 | 4.2E-4 |
| 3 | 1E5 | 6.36383 | 3.1E-4 |
| 3 | 1E6 | 6.65446 | 8.02E-06 |
| d | p | M | Price | Variance |
|---|---|---|---|---|
| 1 | 2 | 1E5 | 1.71756 | 4.68E-05 |
| 1 | 2 | 1E6 | 1.69802 | 7.68E-06 |
| 1 | 2 | 1E7 | 1.69699 | 4.37E-07 |
| 1 | 3 | 1E5 | 1.73389 | 8.43E-05 |
| 1 | 3 | 1E6 | 1.72354 | 6.63E-06 |
| 1 | 3 | 1E7 | 1.72274 | 8.53E-07 |
| #Procs | Time (sec.) | Efficiency |
|---|---|---|
| 1 | 4768 | 1 |
| 2 | 2402 | 0.99 |
| 4 | 1234 | 0.97 |
| 16 | 353 | 0.84 |
| 32 | 173 | 0.86 |
| 64 | 89 | 0.84 |
| 128 | 47 | 0.79 |
| 256 | 24 | 0.76 |
| 512 | 14 | 0.68 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Pricing path-dependent Bermudan options using Wiener chaos expansion: an embarrassingly parallel approach††thanks: The High Performance Computations presented in this paper were performed using the Froggy platform of the CIMENT infrastructure (https://ciment.ujf-grenoble.fr), which is supported by the Rhône-Alpes region (GRANT CPER07_13 CIRA) and the Equip@Meso project (reference ANR-10-EQPX-29-01) of the programme Investissements d’Avenir supervised by the Agence Nationale pour la Recherche.
Jérôme Lelong Univ. Grenoble Alpes, CNRS, Grenoble INP, LJK, 38000 Grenoble, France.
email: [email protected]
Abstract
In this work, we propose a new policy iteration algorithm for pricing Bermudan options when the payoff process cannot be written as a function of a lifted Markov process. Our approach is based on a modification of the well-known Longstaff Schwartz algorithm, in which we basically replace the standard least square regression by a Wiener chaos expansion. Not only does it allow us to deal with a non Markovian setting, but it also breaks the bottleneck induced by the least square regression as the coefficients of the chaos expansion are given by scalar products on the space and can therefore be approximated by independent Monte Carlo computations. This key feature enables us to propose an embarrassingly parallel algorithm to efficiently handle non Markovian payoff.
**Key words: path-dependent Bermudan options, optimal stopping, regression methods, high performance computing, Wiener chaos expansion.
**
AMS subject classification: 62L20, 62L15, 91G60, 65Y05, 60H07
1 Introduction
We fix some finite time horizon and a filtered probability space , where is supposed to be the natural augmented filtration of a dimensional Brownian motion . On this space, we consider an adapted process with values in modeling a –dimensional underlying asset, with . The number of assets can be strictly smaller than the dimension of the Brownian motion to encompass the case of stochastic volatility models or stochastic interest rates. We assume that is a risk neutral measure. We consider a Bermudan option with exercising dates and discounted payoff if exercised at time . We assume that the discrete time payoff process is adapted to the filtration and satisfies . This framework naturally encompasses the case of path-dependent options, ie. when the payoff process writes for any .
Standard arbitrage pricing theory defines the discounted value of the Bermudan option at times by
[TABLE]
Solving this backward recursion known as the dynamic programming principle has been a challenging problem for years and various approaches have been proposed to approximate its solution. The real difficulty lies in the computation of the conditional expectation at each time step of the recursion. If we were to classify the different approaches, we could say that there are regression based approaches (see Carriere (1996); Tsitsiklis and Roy (2001) and quantization approaches (see Bally and Pages (2003); Bronstein et al. (2013)). We refer to Bouchard and Warin (2012) and Pagès (2018) for a survey of the different techniques to price Bermudan options.
Among all the available algorithms to compute using the dynamic programming principle, the one proposed by Longstaff and Schwartz (2001) has the favour of practitioners. Their approach is based on iteratively selecting the optimal policy. Let be the smallest optimal policy after time , then
[TABLE]
All these methods based on the dynamic programming principle either as value iteration (1) or policy iteration (2) require a Markovian setting to be implemented such that the conditional expectation knowing the whole past can be replaced by the conditional expectation knowing only the value of a Markov process at the current date. The theory of the Snell envelope states that the sequence also satisfies
[TABLE]
When the discounted payoff process writes , for any , where is an adapted Markov process, the conditional expectation involved in (2) simplifies into
[TABLE]
where solves the following minimization problem
[TABLE]
with being the set of all measurable functions such that . The real challenge comes from properly approximating the space by a finite dimensional vector space: one typically uses polynomials or local bases. In both cases, to ensure a decent accuracy, the dimension of the approximation of increases exponentially fast with the dimension of . When is a high dimensional process, high performance computing can help but it is well known that solving the least square problem does not scale well and then deteriorates the efficiency of the parallel implementation, see for instance Pagès and Wilbertz (2011); Pagès et al. (2016).
In this work, we target truly path dependent options, i.e. options for which the payoff cannot be written as a function of a Markov process with reasonable size. In this case, (4) does not hold anymore and computing the conditional expectation knowing becomes really challenging. The new idea proposed in this work consists in computing an approximation of for which the conditional expectation knowing is known in a closed form. This will be achieved by using Wiener chaos expansion. Then, we rely on the orthogonality of the chaos expansion to introduce a high degree of parallelism in the algorithm.
In Section 2, we briefly recall the general ideas sustaining Wiener chaos expansion and how it can be used to approximate conditional expectations. Then, we present our algorithm in Section 3 and explain how to efficiently implement it in parallel. Section 4 is devoted to the study of the convergence of the algorithm. We conclude with some numerical experiments in Section 5, which emphasize the impressive scalability of the the parallel implementation and the efficiency of the algorithm for some complex path dependent options.
Notation
In this section, we gather some extensively used notation in the paper
- •
For , . Similarly, for , .
- •
For , . Similarly, for , .
- •
For , we define the set of multi-indices with total degree smaller than by
[TABLE]
- •
For , and we define the set of multi-indices with total degree smaller than and no degree after by
[TABLE]
- •
For , denote the Hermite polynomial.
- •
For , , the multi-variate Hermite polynomials write
[TABLE]
2 Wiener chaos expansion
2.1 General framework
In this section, we briefly recall the principles of Wiener chaos expansion and its basic properties. We refer to Nualart (1998) for theoretical details.
Let be the Hermite polynomial defined by
[TABLE]
They satisfy for all integer , with the convention . We recall that if is a standard random normal vector in , .
It is well-known that every square integrable -measurable random variable admits the following orthonormal decomposition
[TABLE]
where is an orthonormal basis of . We denote by the set of functions such that for all , . For all , we define the Wiener chaos of order by
[TABLE]
We denote the projection of a random variable onto by . Note that the spaces are orthogonal to each other thanks to the properties of the Hermite polynomials.
Consider the indicator functions of the grid defined by with values in defined by
[TABLE]
where denotes the canonical basis of . Based on the definition of , we introduce the truncated Wiener chaos of order up to
[TABLE]
where
[TABLE]
From the orthogonality of the Hermite polynomials, we immediately deduce the following result.
Proposition 2.1**.**
Let be a real valued random variable in . Its projection onto writes
[TABLE]
where
[TABLE]
and the coefficients are obtained as a dot product
[TABLE]
The random variable is called the truncated chaos expansion of order of the random variable . With an obvious abuse of notation, we write, for ,
[TABLE]
We recall the main result concerning the convergence of the truncated chaos expansion (see Theorem 1.1.1 and Proposition 1.1.1 of Nualart (1998))
Proposition 2.2**.**
*Let be a real valued random variable in . Then, converges to in when both and go to infinity. *
The space of truncated Wiener chaos has the key property of being stable by the conditional expectation operator. More precisely, the following result explains how to compute, in a closed form, the conditional expectation of an element of .
Proposition 2.3**.**
Let be a real valued random variable in and let and
[TABLE]
where is the set of multi-indices vanishing after time
[TABLE]
Proof*.*
Taking the conditional expectation in (6) leads to
[TABLE]
Since the Brownian increments after time are independent of and are independent of one another, {\mathbb{E}}\left[\prod_{i=k+1}^{n}\prod_{j=1}^{d}H_{\alpha^{j}_{i}}(G^{j}_{i})\Big{|}{\mathcal{F}}_{t_{k}}\right]=\prod_{i=k+1}^{n}\prod_{j=1}^{d}{\mathbb{E}}\left[H_{\alpha^{j}_{i}}(G^{j}_{i})\right], which is zero as soon as . Hence, the sum in (7) reduces to the sum over the set of multi-indices such that for all and , which is exactly the definition of the set .
Since the sum appearing in is reduced to a sum over the set of multi-indices , it actually only depends on the first increments . One can easily check that is actually given by the chaos expansion of on the first Brownian increments. Hence, computing a conditional expectation simply boils down to dropping the non measurable terms. While it may look like a naive way to proceed, it is indeed correct in this setting. To denote the chaos expansion on the time grid truncated to the first increments, we introduce the notation
[TABLE]
With an obvious abuse of notation, we write for ,
[TABLE]
2.2 Application to the approximation of conditional expectations
In this section, we explain how to use the truncated Wiener chaos expansion of a random variable , to compute its conditional expectation.
Assume that we need samples of the conditional expectations. We sample paths of and thanks to Proposition 2.3 we approximate on the sample path with index by
[TABLE]
where
[TABLE]
Using the strong law of large numbers, we clearly have that for every , converges a.s. to when goes to infinity. Then, we deduce that for any fixed , converges almost surely to when .
Remark 2.4**.**
*Note that we use the same samples to compute the coefficients of the chaos expansion and to approximate . It could have been possible to use different set of samples for the two parts and would have even simplified the theoretical analysis of the algorithm but the price to pay in terms of computational time is prohibitive. Using independent sets of samples would require to simulate new samples of the whole path at each date . *
3 The algorithm
3.1 Description of the algorithm
We aim at solving the dynamic programming equation (2) to obtain . Then, the time price of the Bermudan option writes
[TABLE]
For all , consider a time grid of , such that . We assume that . For , we define such that
[TABLE]
Even though, we do not make the dependency on explicit, it is clear that is an increasing function of .
Now, we introduce some successive approximations of (2). First, we replace the true conditional expectation by the conditional expectation of the truncated Wiener chaos expansion of
[TABLE]
where the ’s are the coefficients of the truncated expansion of
[TABLE]
The standard approach is to sample a bunch of paths of the model along with the corresponding payoff paths , for . We denote by the Brownian path used to sample . Note that is sampled on the finer grid , which enables us to deal with model discretization issues. The vector corresponds to the increments of the Brownian motion on the finer time grid. To compute the ’s on each path, one needs to compute the conditional expectations for . Then, we introduce the final approximation of the backward iteration policy, in which the truncated chaos expansion is computed using a Monte Carlo approximation
[TABLE]
where the are computed as described in Section 2.2. For , the vector is an element of and for every ,
[TABLE]
Then, we finally approximate the time price of the option by
[TABLE]
The pseudo code of our approach corresponds to Algorithm 3.1.
Remark 3.1**.**
From a practical point of view, we advise to consider in the money paths in the chaos expansion as it was already noticed in Longstaff and Schwartz (2001). Hence, the set is replaced by and the coefficients of the chaos expansion are given by
[TABLE]
*This modification does not change the theoretical analysis of the algorithm but improves its numerical behavior. *
Our algorithm is designed as a black box taking as inputs simulations of the Brownian motion and the corresponding payoff process. From a practical point of view, you can design the implementation in such as way that pricing a new product simply amounts to implementing the discretization of the model and the computation of the payoff.
3.2 Comments on the algorithm
The obvious and generic way to deal with truly path-dependent options or non Markovian model using the standard Longstaff Schwartz algorithm would be to consider the whole path as a regressor. It is very much unlikely that one can easily build a set of basis functions which are orthogonal for the law of the discretized path process. Hence, the regression problem would grow exponentially fast and as explained in Benguigui and Baude (2012), parallelism would not help much. Going beyond the Markovian setting requires an orthogonality property, which turns the regression problem into a series of independent inner-products. Of course, it is always possible to pretend everything is Markovian, but then you have no guaranty on the error you are making.
Our algorithm may be related to a regress later method as investigated by Glasserman and Yu (2004a); Balata and Palczewski (2018). At time , a regress later approach is typically composed of two steps: first is decomposed on a set of measurable basis functions, which looks like a least squares approximation of the conditional expectation with respect to . Then, the conditional expectation of each basis function is computed analytically to obtain an approximation of . Our algorithm can also be seen as a two stage method: first we compute the chaos expansion of and then we compute its conditional expectation. Although this way of formulating the algorithm is mathematically correct, it would be totally inefficient to implement it this way. As a matter of fact, taking the conditional expectation of a Wiener chaos expansion simply amounts to dropping non measurable terms and because every coefficient is computed on its own using an inner product, we can directly compute the conditional expectation of the chaos expansion by actually computing a chaos expansion with respect to the Brownian increments up to time only. This more pragmatic way of understanding our algorithm makes it actually closer to a regress now approach.
Closely looking at Algorithm 3.1, it is clear that the central part of the algorithm is the computation of the chaos expansion. Conveniently implementing this step plays a major role in the efficiency of the algorithm. In our C++ implementation, the chaos expansion is performed using the generic multivariate polynomial toolbox from Lelong (2007-2017).
3.3 Complexity analysis
Most of the computational time is spent computing the coefficients of the chaos expansions. Remember that the cardinality of is given by . As the optimal policy is only updated on the in-the-money paths at each time step (see Remark 3.1), the complexity of iteration of the loop on line 3.1 of Algorithm 3.1 is proportional to
[TABLE]
It is worth noting that the complexity decreases when time decreases. The order of the expansion plays a major role in the computational time of the algorithm. So, when the order of the expansion increases from to , the computational time is multiplied by .
3.4 The parallel implementation
The key computational trick of our algorithm is that the chaos coefficients are written as independent expectations and can therefore be parallelized both across and the number of Monte Carlo samples. Simply put, Algorithm 3.1 can be reduced to computing several independent Monte Carlo averages and is therefore very well suited for parallel programming. For a fixed time , there are two ways of introducing parallelism.
- (i)
The coefficients of the truncated Wiener chaos expansion can be computed in parallel. For two multi-indices , the computations of and are independent and can therefore be carried out simultaneously. The update of all the can also be performed in parallel. This approach looks very promising provided that the cardinality of is large enough, at least larger than the number of available computing resources. Note that
[TABLE]
where we recall that when . This approach will be efficient for large enough but will inevitably fail to scale when decreases, ie for smaller dates. 2. (ii)
Alternatively, we can use the number of Monte Carlo samples as the leverage for parallelism. Since the number of samples remains fixed during the whole algorithm, the parallelism will be as efficient for large as for small ones. Assume we have computing resources at our disposal, then each resource handles sample paths and runs the sequential algorithm 3.1 on these paths except that at each time step, a reduction followed by a broadcast are done right before updating the , . In this way, the chaos expansions are computed using the paths. We precisely describe this parallel algorithm in Algorithm 3.2.
We have followed the approach (ii) for our parallel implementation to make sure all the resources are always fully busy, which is the least requirement to ensure a decent scalability. The comparison of Algorithms 3.1 and 3.2 shows that the sequential and parallel algorithms differ very little. We even managed to merge the sequential and parallel implementations into a single code, which is hardly ever feasible especially when using MPI. Each computing resource samples a bunch of paths, on which it updates the optimal stopping policy and contributes to the computation of the ’s. At each time step, we compute an average (a reduction) to get the value of the ’s and then we send (broadcast) the coefficients to every resources. In practice, we actually use the AllReduce111See https://mpitutorial.com/tutorials/mpi-reduce-and-allreduce/ for an explanation of how reduce and broadcast can be efficiently coupled. method from MPI.
It was noted in Benguigui and Baude (2012); Pagès et al. (2016), that using mini-batches to introduce parallelism in least-square Monte Carlo was not convincingly efficient, mainly because the backward induction is essentially sequential. This is due to the regression step itself, which cannot be solved efficiently when Monte Carlo paths are allocated by blocks on each processor. In any parallel implementation of least-squares Monte Carlo, the regression step eventually becomes the bottleneck because of its bad scalability. To circumvent this main issue with least-square Monte Carlo, Pagès and Wilbertz (2011) replaced the regression step by a quantization approach, which allows for natural parallelism. Similarly, Gobet et al. (2016) used stratification to introduce conditional independence between the samples used in each strata. It may look as a mini-batch approach while they have to use new stratified sampling at each time step because they work in a backward stochastic differential setting. Hence, interpreting their approach in terms of mini-batches is not straightforward. In our approach, we rely on the orthogonality of the chaos expansion to replace the regression step by inner products computed by Monte Carlo. Therefore, our approach naturally fits into the mini-batch paradigm with no extra cost.
4 Convergence of the algorithm
In this section, we basically follow the lines of the methodology introduced in Clément et al. (2002). The statements of the convergence results are quite similar even if some assumptions had to be modified to match our framework, but the proofs differ to adapt to the new formulation of the regression step.
There are two independent parts in this section. In Section 4.2, we study the convergence of the algorithm with respect to the chaos expansion when all expectations are assumed to be computed exactly (no Monte Carlo approximation). In Section 4.3, we fix the order and the discretization used in the chaos expansion and we study the convergence with respect to the number of Monte Carlo samples. This is achieved by first proving that the Monte Carlo approximations of the chaos expansion at each time step converge to the true coefficients.
4.1 Notation
To avoid over expanding notation, we simply write instead of in the chaos expansions. At some points, it may be important to make precise which Brownian increments are used in the chaos expansion. To do so, we introduce the notation
[TABLE]
First, it is important to note that the paths for are identically distributed but not independent since the Monte Carlo computation of the chaos expansion coefficients mixes all the paths. We define the vector of the coefficients of the successive expansions and its Monte Carlo approximation .
Now, we recall the notation used by Clément et al. (2002) to study the convergence of the original Longstaff Schwartz approach.
Given a parameter in and vectors in and in , we define the vector field by
[TABLE]
Note that does not depend on the first components of , ie . Moreover,
[TABLE]
For , we also define the functions and by
[TABLE]
Note that and actually only depends on but not on the first components of .
4.2 Chaos approximation of conditional expectations
Proposition 4.1**.**
*For all , in . *
Proof*.*
We proceed by induction. The result is true for as . Assume it holds for (), we will prove it is true for .
[TABLE]
By the induction assumption, the term goes to zero in as both go to infinity. So, we just have to prove that
[TABLE]
converges to zero in .
[TABLE]
Note that . The truncated chaos expansion being an orthogonal projection on the space of random variables measurable with respect to the Brownian increments , we clearly have that
[TABLE]
where the last inequality comes from the orthogonal projection feature of the conditional expectation. Then, the induction assumption for yields that goes to zero in as go to infinity. So, the first term on the r.h.s of (Proof) goes to zero.
As , the second term on the r.h.s of (Proof) goes to zero in thanks to Proposition 2.2. Combining these two results yields the convergence statement of the proposition.
Remark 4.2**.**
*When the discrete time payoff process is measurable for the filtration generated by the discrete time Brownian increments , the result of Proposition 4.1 simplifies to in . There is no need to let go to infinity, it is sufficient to take . From a practical point of view, one should choose in order to monitor the discretization error between the true payoff process and its implementable discretization on a time grid with steps. Then, the parameter has to be considered as being fixed and we actually compute the price of the Bermudan option with payoff process instead of . Therefore, when the model can be exactly sampled, one should choose . *
4.3 Convergence of the Monte Carlo approximation
In the following, we assume that and are fixed and we study the convergence with respect to the number of samples .
4.3.1 Strong law of large numbers
To start, we prove the convergence of the coefficients of the chaos expansions.
Proposition 4.3**.**
*Assume that for every , . Then, for every , converges to a.s. as . *
The proof of Proposition 4.3 based on the following key lemma from Clément et al. (2002). The assumption may look surprising but a very similar assumption was already required in (Clément et al., 2002, Lemma 3.2). This assumption combined with the following lemma proves that the vector field is a.s. continuous w.r.t the expansion coefficients .
Lemma 4.4**.**
For every ,
[TABLE]
where
[TABLE]
Proof* (Proof of Proposition 4.3).*
We proceed by induction. For , the result directly follows from the standard strong law of large numbers. Choose and assume the result holds for , we aim at proving this is true for .
[TABLE]
By the standard strong law of large number, converges a.s. to . Then, it is sufficient to prove that
[TABLE]
Then, using Lemma 4.4, we have
[TABLE]
From the induction assumption for , we have that for , . Then, for any , we have
[TABLE]
where the last equality follows from the strong law of large numbers. As for all , we can let go to [math] to obtain that a.s.
Once the convergence of the expansion is established, we can now study the convergence of to when .
Theorem 4.5**.**
Assume that for every , . Then, for and all ,
[TABLE]
Proof*.*
Note that and by the strong law of large numbers
[TABLE]
Hence, we have to prove that
[TABLE]
For any , and , . Using Lemma 4.4 and that , we have
[TABLE]
Using Proposition 4.3, for all . Then for any ,
[TABLE]
where the last inequality follows from the strong law of large numbers as . We conclude that by letting go to [math] and by using that for every , .
The case proves the strong law of large numbers for the algorithm. Considering that all the paths are actually mixed through the chaos expansion, it is unlikely that the estimators for are unbiased. We recall that and . Then,
[TABLE]
where we have used that all the random variables have the same distribution. Hence, the bias of our estimator is directly linked to the gap between and the true value . Let , then for any , and the corresponding value is the same for and . This means that when increases, the length of increases with the first components remaining unchanged. Therefore, increases with , which suggests that, for a fixed , the bias also increases with . Moreover, it was already noted in Glasserman and Yu (2004b) that for a fixed number of samples , the mean square error on the coefficients of the regression explodes with the number of regressors. In our framework, this means that, for a fixed M, will increase with .
4.3.2 Discussion on the rate of convergence
From Theorem 4.5, we deduce that the standard empirical variance estimator applied to our algorithm converges. For every ,
[TABLE]
The convergence rate analysis carried out in Clément et al. (2002) applies steadily to our approach. Then, under suitable assumptions, the vector
[TABLE]
converges in law to a normal distribution with mean zero. As noted in Clément et al. (2002), determining the asymptotic variance directly from the data generated by a single run of the algorithm is almost impossible. From the proof of the central limit theorem for their algorithm, we have, when goes to infinity, in the sense
[TABLE]
Remember that . By the standard central limit theorem, converges in law to a normal distribution with variance . Then, using the empirical variance of the estimator as a measurement of the algorithm converge actually misses part of the variance since from (12), we know that the empirical variance only takes into account the first term on the r.h.s of (14).
5 Numerical experiments
In this section, we carry out several numerical experiments using our algorithm. In the different tables, the “Price” column corresponds to the value of averaged over independent runs of the algorithm and the “Variance” column is the variance of computed on these independent runs. The first two experiments, which deal with put options, enable us to compare the accuracy of our method with the standard Longstaff Schwartz algorithm using only the in-the-money paths at each time step, whose price is reported in the “LS” column. Then, we consider more sophisticated truly path dependent options for which the use of the standard Longstaff Schwartz algorithm becomes prohibitive because of the well-known curse of dimensionality. In all the examples, we use , ie we do not subdiscretize the grid given by the exercising dates to compute the chaos expansions.
5.1 Examples in the Black Scholes model
The dimensional Black Scholes model writes for
[TABLE]
where is a Brownian motion with values in , is the vector of volatilities, assumed to be deterministic and positive at all times and is the -th row of the matrix defined as a square root of the correlation matrix , given by
[TABLE]
where to ensure that is positive definite.
5.1.1 Assessing the method on the one-dimensional put option
Before investigating more elaborate numerical example, we want to test our method on the Bermudan put option. As standard as this example might be, getting a trustworthy reference price is not an easy task. We rely on prices computed by a convolution method in Lord et al. (2008) and later used as reference prices in Fang and Oosterlee (2009). We report in Table 1 our values compared to the reference prices for two different volatilities. Our prices are already very close the true prices even with a second order expansion . On these examples, we are within of the reference prices.
5.1.2 A put basket option
We consider a put basket option with payoff
[TABLE]
which can be priced using the classical Longstaff Schwartz algorithm and therefore enables us to test the accuracy of our approach in a multidimensional setting. We test our algorithm in dimension and report the results in Table 2 for different numbers of samples and different orders of chaos expansion. The values reported in the “LS” column correspond to the prices computed with the Longstaff Schwartz algorithm with samples and using as regression functions the set of polynomials of total order completed with the payoff function.
We notice that an expansion of order already gives a price fairly close to the “LS” one for a quite reasonable computational time. Increasing to improves the accuracy only when the number of samples is also increased. Indeed, we can see that the prices obtained for and for small values of ( or ) are above the dual price. This clearly happens because is too large compared to , which induces a bias. We refer the reader to the discussion following Theorem 4.5 for more information on this point. Hence, in a brand new setting, we advise to start with and to monitor the variance to fix how many Monte Carlo samples are required . Then, if need be, one can try with keeping in mind that should be increased at the same time. In our example, we basically add an order of magnitude to , when going from to .
5.1.3 Asian option
For this example, we consider a one dimensional Black Scholes model, . We consider an Asian with payoff with and for
[TABLE]
We approximate the continuous time integral by an arithmetic average and we compare our results with the one reported by Hull and White (1993) (in the “HW” column in Table 3), which, despite being quite old, is still considered as a benchmark by many papers investigating American Asian options.
It is known that although the payoff does not seem to be Markovian in dimension , if we augment the state space and consider the pair , then the option becomes Markovian again. Hence, Asian options can serve as a good example to assess the efficiency of our algorithm by considering the non Markovian representation of the Asian option in our method. As in the previous example, we notice that a second order expansion already gives very accurate price, within of the benchmark price computed by Hull and White (1993) using a tree method. Increasing to does not significantly improve the accuracy of the process but does require to increase the number of Monte Carlo samples.
5.1.4 Moving average option
For this example, we consider a one dimensional Black Scholes model, . We consider a moving average option with payoff for with
[TABLE]
where is the length of the averaging window and is a delay.
We approximate the continuous time integral by an arithmetic average and compare our results with the benchmark prices computed by Bernhart et al. (2011). Let and . For every , we approximate by
[TABLE]
The benchmark prices reported in the “LS” column come from Bernhart et al. (2011) and were computed using the standard Longstaff Schwartz algorithm with regression factors at time given by
[TABLE]
This leads to a regression problem with variables, which makes it very CPU demanding. While our approach may also look like a multi variate regression, the main difference lies in the choice of an orthogonal basis function which turns the computation of the coefficients of the regression from a linear system into a bunch of independent Monte Carlo computations. Although this seems a minor change, it is indeed a huge improvement as it breaks the bottleneck of the standard Longstaff Schwartz algorithm and makes it easy to parallelize.
We run two series of tests on the moving average option, which is a typical example of a true path-dependent option in the sense that the size of the underlying Markov process (see (4)) is basically the number of exercising dates. We report in Table 4 the results for the non delayed option, ie and in Table 5 the results for the option with delay. When there is no delay (Table 4), we are able to recover the prices computed with the Longstaff Schwartz method using the full list of regressors. Our results are already very accurate for a chaos expansion of order . To really benefit from a more accurate chaos expansion of order , one also needs to increase the number of samples to cut down the bias. Note the price in the “LS-price” column for . In Bernhart et al. (2011), they did not succeed in computing the price of this option using the Longstaff Schwartz method using the full list of regressors, so they only provided a non Markovian approximation , which is always below the true price. Hence, the value obtained for and does make sense. We also report in the column “Dual price” of Table 4 the upper bound obtained from Lelong (2018). A small gap remains between the lower and upper bounds, but it can be considered as more than acceptable considering the numerical challenges represented by the highly path-dependent products. Since the work by Bernhart et al. (2011), new methods have been developed to handle high dimensional regressions mostly by using machine learning techniques. For instance, we can cite the recent works Becker et al. (2019a, b). It would be very interesting to use these algorithms to build a price comparator for the non-Markovian settings.
5.2 A put option in the Heston model
We start with a put option in the Heston model to assess the accuracy of our algorithm. We recall the definition of the Heston model
[TABLE]
For the put option used in the numerical experiments of Table 6, the Longstaff Schwartz algorithm gives using degree polynomials for the regression and samples. Note that as we only consider in the money paths for the regression step, the payoff function is actually a linear function of the underlying asset — a degree one polynomial. So there is no need to add the payoff function to the regression basis as for more sophisticated options. Obviously, we consider both the asset price and the volatility process as regression factors.
Clearly, we see in the figures of Table 6 that going from to samples does not make any difference on the prices. On the contrary, the prices obtained with are always a little higher, which may look surprising. This is a actually related to the bias phenomenon described at the end of Section 4.3.1. The variance of the coefficients of the chaos expansion is responsible for introducing a bias into the price. To avoid this, one needs to use sufficiently many Monte Carlo samples to compute the chaos expansions. Anyway, the prices obtained with and or are within of the standard Longstaff Schwartz price.
5.3 Scalability of the parallel implementation
The scalability tests were run on a BullX DLC supercomputer containing 3204 cores. The code is written in C++ using the OpenMPI library to handle the communication and the PNL library Lelong (2007-2017) to compute the chaos expansions in a generic way for any order . We report in Table 7 the evolution of the efficiency with respect to the number of resources used. We recall that the efficiency is defined as the ratio between the sequential running time and the product of the parallel running time times the number of resources. Clearly, the efficiency takes values between [math] and and the closer to one, the better. In the example used for the scalability study, we managed to cut down the computational time from an hour and a half to 14 seconds while maintaining the efficiency at almost , which represents an astonishing improvement in terms of scalability. For a fixed size problem, it is well-known that the efficiency eventually decreases to zero when the number of processors go to infinity as every algorithm has a purely sequential part which becomes predominant in the end. Hence, the efficiency value of has to be considered together with the absolute computational time. We refer to Dung Doan et al. (2008); Dung Doan et al. (2010) for experiments on the scalability of different parallel approaches for Bermudan options. Although their framework is a bit different from ours, we can assert that our efficiency proves a very good scalability.
6 Conclusion
In this work, we have presented a new algorithm to price Bermudan option in non Markovian settings: the non Markovian feature can either come from the truly path dependent feature of the option or from the use of rough volatility models for instance. Our algorithm makes it easy to design a generic American option pricer, actually not more difficult than for a European option pricer. Although this may sound a bit ambitious, our algorithm is designed as a black box taking as inputs sample paths of the underlying multi-dimensional Brownian motion and the associated samples of the payoff process, which is basically the same as for European options. The smart design of our algorithm combined with orthogonality feature of the Wiener chaos expansion leads to an embarrassingly parallel algorithm, in which each node samples a bunch of paths, on which it updates the optimal stopping policy. Each node contributes to the computation of the ’s and at each time step, we make a reduction to get the value of the ’s and then a broadcast makes the coefficients available to everyone. The parallel implementation requires very few communications and therefore shows an impressive efficiency.
The methodology developed in this work in a Brownian setting could be adapted to Lévy processes by adding Charlier polynomials to the Hermite polynomials. We refer to Geiss and Labart (2017) for the use of chaos expansions with jumps.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Balata and Palczewski [2018] A. Balata and J. Palczewski. Regress-later Monte-Carlo for optimal inventory control with applications in energy. ar Xiv e-prints , page ar Xiv:1703.06461, Mar 2018.
- 2Bally and Pages [2003] V. Bally and G. Pages. A quantization algorithm for solving multidimensional discrete-time optimal stopping problems. Bernoulli , 9(6):1003–1049, 2003.
- 3Becker et al. [2019 a] S. Becker, P. Cheridito, and A. Jentzen. Deep optimal stopping. Journal of Machine Learning Research , 20(74):1–25, 2019 a.
- 4Becker et al. [2019 b] S. Becker, P. Cheridito, A. Jentzen, and T. Welti. Solving high-dimensional optimal stopping problems using deep learning, 2019 b.
- 5Benguigui and Baude [2012] M. Benguigui and F. Baude. Towards parallel and distributed computing on GPU for American basket option pricing. In 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings , pages 723–728. IEEE, 2012.
- 6Bernhart et al. [2011] M. Bernhart, P. Tankov, and X. Warin. A finite-dimensional approximation for pricing moving average options. SIAM J. Financial Math. , 2(1):989–1013, 2011.
- 7Bouchard and Warin [2012] B. Bouchard and X. Warin. Monte-carlo valuation of American options: facts and new algorithms to improve existing methods. In R. A. Carmona, P. Del Moral, P. Hu, and N. Oudjane, editors, Numerical Methods in Finance , volume 12 of Springer Proceedings in Mathematics , pages 215–255. Springer Berlin Heidelberg, 2012.
- 8Bronstein et al. [2013] A. L. Bronstein, G. Pagès, and J. Portès. Multi-asset American options and parallel quantization. Methodology and Computing in Applied Probability , 15(3):547–561, 2013.
