Simulation-based Value-at-Risk for Nonlinear Portfolios

Junyao Chen; Tony Sit; Hoi Ying Wong

arXiv:1904.09088·stat.ME·April 22, 2019

Simulation-based Value-at-Risk for Nonlinear Portfolios

Junyao Chen, Tony Sit, Hoi Ying Wong

PDF

TL;DR

This paper introduces a simulation-based method for estimating Value-at-Risk in nonlinear portfolios, improving accuracy and convergence over traditional approaches, especially for complex derivatives.

Contribution

It proposes a generic, model selection-enhanced simulation algorithm for VaR estimation applicable to high-dimensional, nonlinear portfolios with American-style derivatives.

Findings

01

Faster convergence of the new VaR estimation method.

02

Effective handling of high-dimensional, nonlinear derivative portfolios.

03

Improved accuracy over traditional delta-normal approaches.

Abstract

Value-at-risk (VaR) has been playing the role of a standard risk measure since its introduction. In practice, the delta-normal approach is usually adopted to approximate the VaR of portfolios with option positions. Its effectiveness, however, substantially diminishes when the portfolios concerned involve a high dimension of derivative positions with nonlinear payoffs; lack of closed form pricing solution for these potentially highly correlated, American-style derivatives further complicates the problem. This paper proposes a generic simulation-based algorithm for VaR estimation that can be easily applied to any existing procedures. Our proposal leverages cross-sectional information and applies variable selection techniques to simplify the existing simulation framework. Asymptotic properties of the new approach demonstrate faster convergence due to the additional model selection…

Figures3

Click any figure to enlarge with its caption.

Tables7

Table 1. Table 1 : 10-day 95% VaR of Rainbow Option

	Mean	Median	Standard Deviation	Back Testing	Time (in seconds)
LSM	1.78698	1.78664	0.11482	0.0290	18.62
LLSM	1.61693	1.61108	0.10932	0.0442	20.25
Delta-normal^†	1.74591	1.74591	-	0.0329	7.62
Delta-gamma^†	5.00858	5.00858	-	0	66.36
Oracle	1.58089	1.58089	-	0.0483	163,850
Oracle^†	1.56778	1.56778	-	0.0500	3,634

Table 2. Table 2 : 10-day 95% VaR of European Swaption

	Mean	Median	Standard Deviation	Back Testing	Time (in seconds)
SLSM	8.94452	9.02583	1.45876	0.0200	13.07
GLSM	19.8179	19.8070	1.23116	0.0000	17.25
LLSM	7.02251	7.15271	1.62806	0.0505	25.16
Delta-normal	8.88734	8.88922	3.45005	0.0208	285.78
Oracle	7.04185	7.04185	-	0.0500	242,200

Table 3. Table 3 : 10-day 95% VaR of Bermudan Swaption

	Mean	Median	SD	Time (in seconds)
SLSM	8.38623	8.48900	1.59813	195.00
GLSM	21.0271	21.1145	1.51290	226.62
LLSM	5.01065	4.98452	1.86820	270.01
Delta-normal	189.649	7.87119	371.980	8,807.50

Table 4. Table 4 : Value of Bermudan Swaption

Time	$T_{2}$ =year 2			$t_{1}$ =day 10
	Mean	Median	SD	Mean	Median	SD
SLSM	72.975	72.946	0.85212	69.517	69.489	0.81175
GLSM	75.391	72.946	0.86740	71.819	71.707	0.82638
LLSM	73.452	73.426	0.87146	69.971	69.946	0.83018

Table 5. Table 5 : VaR Trend for Increasing Number of Stopping Times

Stopping Times	SLSM	GLSM	LLSM
1	8.79705	19.7843	6.87493
4	8.54725	21.9728	6.06602
6	8.31433	22.8349	5.69590
8	8.17318	22.9505	5.36403
10	8.04154	22.9719	5.08469
12	7.92492	22.7725	4.97697
14	7.81349	22.6823	4.91529
16	7.76497	22.6077	4.84343
18	7.75723	22.5957	4.81491

Table 6. Table 6 : Parameters in the Underlying Model

$S_{i, - 1}$	$μ_{i}$	$σ_{i}$
80.38723	1.1015E-06	0.0085263
42.70244	1.5939E-06	0.0093093
67.57745	3.4755E-06	0.0024763
85.70454	3.8621E-05	0.0021646
58.11831	8.6745E-05	0.0042942
32.29635	7.4338E-05	0.0025601
57.28909	9.0098E-05	0.0044424
68.65604	1.1443E-05	0.0010326
86.43502	7.7736E-05	0.0016128
81.60649	1.2489E-05	0.0013172

Table 7. Table 7 : Parameters in the Correlation Matrix

	$S_{1}$	$S_{2}$	$S_{3}$	$S_{4}$	$S_{5}$	$S_{6}$	$S_{7}$	$S_{8}$	$S_{9}$	$S_{10}$
$S_{1}$	1.00000	0.55000	0.29311	0.28272	0.23681	0.33050	0.34773	0.39159	0.29665	0.23986
$S_{2}$	0.55000	1.00000	0.28613	0.27540	0.37854	0.38001	0.25678	0.32052	0.26683	0.28365
$S_{3}$	0.29311	0.25510	1.00000	0.31191	0.39619	0.32266	0.27440	0.26772	0.39976	0.28598
$S_{4}$	0.28613	0.33050	0.27440	1.00000	0.25510	0.23745	0.22811	0.25273	0.22504	0.35783
$S_{5}$	0.28273	0.38001	0.22811	0.25273	1.00000	0.24183	0.25727	0.29702	0.30817	0.33151
$S_{6}$	0.27540	0.32266	0.25727	0.29702	0.39976	1.00000	0.25681	0.21482	0.32993	0.20017
$S_{7}$	0.31191	0.23745	0.25681	0.21482	0.22504	0.21862	1.00000	0.28263	0.29389	0.24210
$S_{8}$	0.23681	0.24183	0.39159	0.28263	0.30817	0.23986	0.35783	1.00000	0.21862	0.23128
$S_{9}$	0.37854	0.34773	0.32052	0.29665	0.32993	0.28365	0.33151	0.24210	1.00000	0.37021
$S_{10}$	0.39619	0.25678	0.26772	0.26683	0.29389	0.28598	0.20017	0.23128	0.37021	1.00000

Equations228

U (t, X) = τ \in T sup E^{Q} {f (X_{τ}) ∣ F_{t}},

U (t, X) = τ \in T sup E^{Q} {f (X_{τ}) ∣ F_{t}},

U_{j} := τ \in T_{j, T} sup E^{Q} {f (T_{τ}, X_{τ}) ∣ F_{j}},

U_{j} := τ \in T_{j, T} sup E^{Q} {f (T_{τ}, X_{τ}) ∣ F_{j}},

\displaystyle U_{t_{1}}:=\mathbf{E}(U_{1}\mid\mathscr{F}_{t_{1}})=\mathbf{E}\left[\sup_{\tau\in\mathcal{T}_{1,T}}\mathbf{E}\left\{f(T_{\tau},X_{\tau})\mid\mathscr{F}_{1}\right\}\bigg{|}\mathscr{F}_{t_{1}}\right].

\displaystyle U_{t_{1}}:=\mathbf{E}(U_{1}\mid\mathscr{F}_{t_{1}})=\mathbf{E}\left[\sup_{\tau\in\mathcal{T}_{1,T}}\mathbf{E}\left\{f(T_{\tau},X_{\tau})\mid\mathscr{F}_{1}\right\}\bigg{|}\mathscr{F}_{t_{1}}\right].

U_{j} := τ \in T_{j, L} ess sup E (Z_{τ} ∣ F_{j}) j = 0, 1, \dots, L,

U_{j} := τ \in T_{j, L} ess sup E (Z_{τ} ∣ F_{j}) j = 0, 1, \dots, L,

U_{j}:=\left\{\begin{array}[]{ll}Z_{T},&j=L\\ \max\{Z_{j},\mathbf{E}(U_{j+1}\mid\mathscr{F}_{j})\},&0\leq j\leq L-1.\end{array}\right.

U_{j}:=\left\{\begin{array}[]{ll}Z_{T},&j=L\\ \max\{Z_{j},\mathbf{E}(U_{j+1}\mid\mathscr{F}_{j})\},&0\leq j\leq L-1.\end{array}\right.

\left\{\begin{array}[]{ll}\tau_{T}=T\\ \tau_{j}=j\mathbf{1}_{\{Z_{j}\geq\mathbf{E}(Z_{\tau_{j+1}}\mid\mathscr{F}_{j})\}}+\tau_{j+1}\mathbf{1}_{\{Z_{j}<\mathbf{E}(Z_{\tau_{j+1}}\mid\mathscr{F}_{j})\}},~{}~{}0\leq j\leq L-1,\end{array}\right.

\left\{\begin{array}[]{ll}\tau_{T}=T\\ \tau_{j}=j\mathbf{1}_{\{Z_{j}\geq\mathbf{E}(Z_{\tau_{j+1}}\mid\mathscr{F}_{j})\}}+\tau_{j+1}\mathbf{1}_{\{Z_{j}<\mathbf{E}(Z_{\tau_{j+1}}\mid\mathscr{F}_{j})\}},~{}~{}0\leq j\leq L-1,\end{array}\right.

E (Z_{τ_{j + 1}} ∣ F_{j}) = E (Z_{τ_{j + 1}} ∣ X_{j}) = M \to \infty lim a_{j}^{[M]} \cdot L^{[M]} (X_{j}),

E (Z_{τ_{j + 1}} ∣ F_{j}) = E (Z_{τ_{j + 1}} ∣ X_{j}) = M \to \infty lim a_{j}^{[M]} \cdot L^{[M]} (X_{j}),

Z_{τ_{j + 1}} = a_{j}^{[M]} \cdot L^{[M]} (X_{j}) + ϵ_{j}, j = 1, \dots, L - 1,

Z_{τ_{j + 1}} = a_{j}^{[M]} \cdot L^{[M]} (X_{j}) + ϵ_{j}, j = 1, \dots, L - 1,

A_{j}^{[M, N]} = N^{- 1} i = 1 \sum N {L^{[M]} (X_{j}^{[i]})} {L^{[M]} (X_{j}^{[i]})}^{⊤} .

A_{j}^{[M, N]} = N^{- 1} i = 1 \sum N {L^{[M]} (X_{j}^{[i]})} {L^{[M]} (X_{j}^{[i]})}^{⊤} .

\left\{\begin{array}[]{ll}\tau_{T}^{[M]}=T\\ \tau_{j}^{[M]}=j\mathbf{1}_{\{Z_{j}\geq a_{j}^{[M]}\cdot L^{[M]}(X_{j})\}}+\tau_{j+1}^{[M]}\mathbf{1}_{\{Z_{j}<a_{j}^{[M]}\cdot L^{[M]}(X_{j})\}},~{}~{}0\leq j\leq L-1.\end{array}\right.

\left\{\begin{array}[]{ll}\tau_{T}^{[M]}=T\\ \tau_{j}^{[M]}=j\mathbf{1}_{\{Z_{j}\geq a_{j}^{[M]}\cdot L^{[M]}(X_{j})\}}+\tau_{j+1}^{[M]}\mathbf{1}_{\{Z_{j}<a_{j}^{[M]}\cdot L^{[M]}(X_{j})\}},~{}~{}0\leq j\leq L-1.\end{array}\right.

a_{j}^{[M, N]} := α \in I R^{M} arg min {∥ Z_{τ_{j + 1}^{[M, N]}} - α \cdot L^{[M]} (X_{j}) ∥_{2}^{2} + λ ∥ α ∥_{1}}, j = 1, 2, \dots, L - 1,

a_{j}^{[M, N]} := α \in I R^{M} arg min {∥ Z_{τ_{j + 1}^{[M, N]}} - α \cdot L^{[M]} (X_{j}) ∥_{2}^{2} + λ ∥ α ∥_{1}}, j = 1, 2, \dots, L - 1,

a_{j}^{* [M, N]} := α \in I R^{M} arg min {∥ Z_{τ_{j + 1}^{[M, N]}} - α \cdot L^{[M]} (X_{j}) ∥_{2}^{2}}, j = 1, 2, \dots, L - 1

a_{j}^{* [M, N]} := α \in I R^{M} arg min {∥ Z_{τ_{j + 1}^{[M, N]}} - α \cdot L^{[M]} (X_{j}) ∥_{2}^{2}}, j = 1, 2, \dots, L - 1

U_{j}^{[M]} := {Z_{T}, Z_{j} 1_{{Z_{j} \geq a_{j}^{[M]} \cdot L^{[M]} (X_{j})}} + U_{j + 1}^{[M]} 1_{{Z_{j} < a_{j}^{[M]} \cdot L^{[M]} (X_{j})}}, j = L, j = 1, 2, \dots, L - 1.

U_{j}^{[M]} := {Z_{T}, Z_{j} 1_{{Z_{j} \geq a_{j}^{[M]} \cdot L^{[M]} (X_{j})}} + U_{j + 1}^{[M]} 1_{{Z_{j} < a_{j}^{[M]} \cdot L^{[M]} (X_{j})}}, j = L, j = 1, 2, \dots, L - 1.

E (Z_{τ_{j}^{[M, N]}} ∣ F_{j}) \to E (Z_{τ_{j}} ∣ F_{j}) as M, N \to \infty.

E (Z_{τ_{j}^{[M, N]}} ∣ F_{j}) \to E (Z_{τ_{j}} ∣ F_{j}) as M, N \to \infty.

M \to \infty lim E (Z_{τ_{j}^{[M]}} ∣ F_{j}) = E (Z_{τ_{j}} ∣ F_{j}) .

M \to \infty lim E (Z_{τ_{j}^{[M]}} ∣ F_{j}) = E (Z_{τ_{j}} ∣ F_{j}) .

M \to \infty N \to \infty lim E (Z_{τ_{j}^{[M, N]}} ∣ F_{j}) = M \to \infty lim N \to \infty lim E (Z_{τ_{j}^{[M, N]}} ∣ F_{j}) = M \to \infty lim E (Z_{τ_{j}^{[M]}} ∣ F_{j}) = E (Z_{τ_{j}} ∣ F_{j}) .

M \to \infty N \to \infty lim E (Z_{τ_{j}^{[M, N]}} ∣ F_{j}) = M \to \infty lim N \to \infty lim E (Z_{τ_{j}^{[M, N]}} ∣ F_{j}) = M \to \infty lim E (Z_{τ_{j}^{[M]}} ∣ F_{j}) = E (Z_{τ_{j}} ∣ F_{j}) .

U_{j}^{* [M_{1}, N]} \to a s U_{j} and U_{j}^{[M, N]} \to a s U_{j} as N \to \infty.

U_{j}^{* [M_{1}, N]} \to a s U_{j} and U_{j}^{[M, N]} \to a s U_{j} as N \to \infty.

VaR_{j}^{[M, N]} \to VaR_{j}^{[M]} as N \to \infty,

VaR_{j}^{[M, N]} \to VaR_{j}^{[M]} as N \to \infty,

VaR_{j}^{[M, N]}

VaR_{j}^{[M, N]}

VaR_{j}^{[M]}

Z_{τ_{1}^{[M]}} = a_{t_{1}}^{[M]} \cdot L^{[M]} (X_{t_{1}}) + ϵ_{t_{1}},

Z_{τ_{1}^{[M]}} = a_{t_{1}}^{[M]} \cdot L^{[M]} (X_{t_{1}}) + ϵ_{t_{1}},

\overset{ˉ}{W} ≜

\overset{ˉ}{W} ≜

\overset{ˉ}{W}^{*} ≜

g_{N}(u,w)\leq p_{0,N}(w),\quad\bigg{|}\frac{\partial}{\partial u}g_{N}(u,w)\bigg{|}\leq p_{1,N}(w),\quad\bigg{|}\frac{\partial^{2}}{\partial u^{2}}g_{N}(u,w)\bigg{|}\leq p_{2,N}(w).

g_{N}(u,w)\leq p_{0,N}(w),\quad\bigg{|}\frac{\partial}{\partial u}g_{N}(u,w)\bigg{|}\leq p_{1,N}(w),\quad\bigg{|}\frac{\partial^{2}}{\partial u^{2}}g_{N}(u,w)\bigg{|}\leq p_{2,N}(w).

N sup \int_{- \infty}^{\infty} ∣ w ∣^{r} p_{i, N} (w) d w < \infty for i = 0, 1, 2 and 0 \leq r \leq 4.

N sup \int_{- \infty}^{\infty} ∣ w ∣^{r} p_{i, N} (w) d w < \infty for i = 0, 1, 2 and 0 \leq r \leq 4.

VaR_{t_{1}}^{[M, N]} - VaR_{t_{1}}^{[M]}

VaR_{t_{1}}^{[M, N]} - VaR_{t_{1}}^{[M]}

VaR_{t_{1}}^{* [M, N]} - VaR_{t_{1}}^{[M]}

100 max (i min \frac{S _{i T}}{S _{i 0}} - K, 0),

100 max (i min \frac{S _{i T}}{S _{i 0}} - K, 0),

A (T_{i}) = 1000 \cdot E^{Q^{i}} {j = i + 1 \sum 20 D (T_{i}, T_{j}) δ_{j} (L_{j} (T_{i}) - K)}^{+} F_{i},

A (T_{i}) = 1000 \cdot E^{Q^{i}} {j = i + 1 \sum 20 D (T_{i}, T_{j}) δ_{j} (L_{j} (T_{i}) - K)}^{+} F_{i},

\overset{a}{^}_{n}^{m} := α \in I R^{P} arg min {i = 1 \sum n (y_{i}^{[M]} - x_{i}^{⊤} α)^{2} + λ ∥ α ∥_{1}}^{2} .

\overset{a}{^}_{n}^{m} := α \in I R^{P} arg min {i = 1 \sum n (y_{i}^{[M]} - x_{i}^{⊤} α)^{2} + λ ∥ α ∥_{1}}^{2} .

\overset{a}{^}_{n}^{m}

\overset{a}{^}_{n}^{m}

\displaystyle=\operatorname*{arg\,min}_{\alpha\in\rm I\!R^{P}}\big{\{}\sum_{i=1}^{n}(y_{i}^{m}-y_{i}+\varepsilon_{i}+x_{i}^{\top}(a-\alpha))^{2}+\lambda\|\alpha\|_{1}\big{\}}.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Simulation-based Value-at-Risk for Nonlinear Portfolios

Junyao Chen, Tony Sit and Hoi Ying Wong

*Department of Statistics, The Chinese University of Hong Kong

[email protected] [email protected] [email protected]*

Abstract

Value-at-risk (VaR) has been playing the role of a standard risk measure since its introduction. In practice, the delta-normal approach is usually adopted to approximate the VaR of portfolios with option positions. Its effectiveness, however, substantially diminishes when the portfolios concerned involve a high dimension of derivative positions with nonlinear payoffs; lack of closed form pricing solution for these potentially highly correlated, American-style derivatives further complicates the problem. This paper proposes a generic simulation-based algorithm for VaR estimation that can be easily applied to any existing procedures. Our proposal leverages cross-sectional information and applies variable selection techniques to simplify the existing simulation framework. Asymptotic properties of the new approach demonstrate faster convergence due to the additional model selection component introduced. We have also performed sets of numerical results that verify the effectiveness of our approach in comparison with some existing strategies.

Keywords: Value-at-Risk, least-squares Monte Carlo, American-type derivatives, high dimensional portfolios

Introduction

One of the everyday challenges that financial institutions faces is re-evaluation of values and/or risk levels of their portfolios that mature some time in the future, which can generally be expressed in the form of

[TABLE]

where $t~{}(t>0)$ denotes the time, $f$ is a deterministic payoff function evaluated at the underlying asset value $X_{t}$ , $\mathbb{Q}$ denotes a risk-neutral probability measure with respect to $\mathbb{P}$ and $\mathcal{T}$ is a family of stopping times. The filtration up to time $t$ is denoted as $\mathscr{F}_{t}$ . More importantly, based on these valuations, financial institutions need to calculate regulatory capitals in order to fulfill the requirements specified in Basel II for the banking industry BIS (2013) or Solvency II for the insurance industry. Computation of regulatory capitals are closely related to Value-at-Risk (VaR), a fundamental quantity upon which some other coherent risk measures, including the expected shortfall Artzner et al. (1999) are developed. Readers may refer to Kou et al. (2013), Kou and Peng (2016) among others for further discussion. The main focus of this paper is to propose a more effective method for estimating VaRs.

While high-dimensional portfolios, or derivatives with large number of underlying assets, are common, a substantial portion of securities traded are derivatives with nonlinear payoffs; this renders the first-order, or even second-order, approximations insufficient for risk estimation. Evaluations of (1) and their corresponding risk measures hence become a non-trivial task. Given the fact that analytic solutions of (1) are hard to obtained in most cases, simulation is generally the only feasible resort; see Chan and Wong (2015); Glasserman (2003); Hong et al. (2014) amongst others. Despite their simplicity, simulation-based procedures may not be feasible because of its heavy computation burden. Although there have been new solutions on improving the computational efficiency (see, for instance, Gramacy and Ludkovski, 2015), extensions to high-dimensional settings are not entirely straight-forward. To evaluate a $t_{1}$ -day VaR with a particular statistical model chosen, one may carry out nested simulations.

An optimal allocation of computational effort for each layer (Broadie et al., 2011) or simply reduce the number of simulated trails can also be applied for a more computationally economical alternative. However, curtailment of trials in either layer may lead to potentially substantial estimation bias and instability as pointed out in Bauer et al. (2012).

In view of the aforementioned difficulties, current market practice is to calculate VaRs via Greek approximations such as the delta-normal and delta-gamma approximations; see Jorion (2006). Performance of these approaches can sometimes be disappointing. In particular, for portfolios with highly nonlinear payoffs, the first-order approximation is far from sufficient in order to produce acceptably small errors. Besides, since all these Greeks are time-varying, delta-normal and delta-gamma approximations are reasonable only for portfolios with short investment horizons – this can be rather restrictive for insurance companies as the solvency capital ratios (SCR) required involve the one-year VaR valuation. Computation burden also poses a big concern as it increases exponentially with the number of stochastic variables included. Aggregation of huge biases from evaluating the Greeks numerically can also be potentially substantial.

To tackle the above challenges, Bauer et al. (2012) novelly proposed the use of the Least-squares Monte Carlo (LSM) approach to VaR computation based on Longstaff and Schwartz’s (2001) seminal development for pricing American options. This approach, however, suffers from the curse of high-dimensionality when the number of underlying assets considered grows. The vast number of regressors generates highly volatile or even inconsistent coefficient estimates, which in turns leads to poor VaR estimates.

This paper incorporates the shrinkage idea in least-squares simulation for high-dimensional nonlinear portfolio VaRs. We shall demonstrate our proposal via least absolute shrinkage and selection operator (LASSO; Tibshirani, 1996), or equivalently the constrained $\ell_{1}$ minimization. Noteworthy, our proposal shares a similar view with Pun and Wong (2016), Chiu et al. (2017) and Pun and Wong (2019) amongst others in the sense that the introduction of the LASSO penalty enables consistent estimation of the quantities of interest. For instance, Pun and Wong (2016) proved that the estimation errors of high-dimensional portfolio makes the optimal portfolio objective function diverge while our results demonstrate that, with appropriate shrinkage due to LASSO, the Longstaff and Schwartz’s (2001) approach can be properly implemented under high-dimensional cases.

Summary of Contributions

In view of the popularity of the regression-based/Longstaff-Schwartz algorithm, our main goal is to study the corresponding convergence properties under the high dimensional setting. More specifically, this work contributes to the literature on the following three aspects:

**Proper handling of issues due to high-dimensionality: Amongst several works on analyzing the asymptotics of Longstaff-Schwartz algorithm, Clement et al. (2002) provides theoretical justifications for regular cases with $p\ll N$ , where $p$ and $N$ denote the dimension of the regressors and the sample size, respectively. One key assumption for the convergence results is that the model should include all the significant basis functions. Selection of basis functions is typically carried out rather subjectively and this assumption may not hold typically for assets with large numbers of underlying assets. To provide a more objective and systematic alternative, our approach leverages recent elegant results developed for variable selection so that we can consider a substantially larger number of covariates in the regression model without suffering issues due to high-dimensionality. Although various methods have been developed lately for high-dimensional linear regression such as the LASSO (see Tibshirani, 1996), to the best of our knowledge, it is the first attempt to justify both theoretically and numerically how these variable selection tools can be incorporated in the Longstaff-Schwartz framework. The corresponding convergence results for various relevant estimates are also missing. To this end, we establish the relevant asymptotic results for both valuation and VaR estimation as the number of simulated paths $N$ goes to infinity together with the dimension in the regression model. Thus, for situations under which significant basis functions are not precisely known in advance, which are frequently encountered in various applications, the newly proposed shrinkage procedure, namely LASSO Least-squares Monte Carlo (LLSM), offers a higher chance of selecting influential basis functions in the regression than LSM. ** 2. 2.

Theoretical construction: We also enrich the proof by permitting estimation errors in the least-squares regression instead of assuming ideal estimates as required in Clement et al. (2002). This extension provides a more general discussion to the problem concerned. The framework developed lays down the foundation for other possible extensions, including the use of other variable selection methods besides LASSO as well as for other risk measures including expected shortfall (ES). 3. 3.

**Computational efficiency: On the computation aspect, with the new variable selection element, the new proposal can handle an extensive number of basis functions based on asset prices and/or other risk factors and the LASSO component assists in selecting objectively and systematically the significant basis functions. LLSM significantly outperforms nested simulation and the Greek approximations in our numerical studies. The computational efficiency of LLSM is more prominent as the number of underlying stochastic variables increases. Numerical results show that it demands merely an additional $5\%$ (or 20% including cross validation) of the total computation time to incorporate LASSO into the original LSM. The amount of additional computation time required declines as the dimension $p$ grows. The quality of resulting estimates is, however, dramatically improved; see Section 3. **

Organization of the Paper

The remainder of this paper is organized as follows. Section 2 elucidates the LLSM procedure, develops theoretical justifications for convergence results of LLSM and discusses further improvement of the new approach. Section 3 presents numerical studies on several derivatives with American features and nonlinear payoff functions. The performance of LLSM is demonstrated via a comprehensive comparison with existing methods and the oracle approach. Concluding remarks can be found in section 4, followed by Appendix which presents the proofs for results discussed in Section 2. Details of our numerical studies, including model specifications, are also included.

Methodology

Our procedure of LASSO Least-squares Monte Carlo (LLSM) for a general portfolio with early exercise feature targets at $100(1-\alpha)\%$ $t_{1}$ -day VaR over the investment horizon ranging from $T_{0}$ to $T$ during which stopping times denoted by $T_{1},\ldots,T_{L}=T$ are covered. Noteworthy, VaRs are not necessarily evaluated at stopping times, the procedure LLSM can handle a more generic $t_{1}$ -day VaR with $t_{1}\in(T_{0},T_{1})$ .

Similar to the celebrated Bauer et al. (2012) and Longstaff and Schwartz (2001) approaches, LLSM is formulated as a backward recursive procedure. In its first step, LLSM estimates the conditional expected option value via simulating paths. Based on these paths, regressions are carried out on the resulting option values. In contrast to the existing strategies, LLSM adds a variable selection step which allows an objective procedure for selecting the influential basis functions in the regression models considered. The corresponding regression result provides an approximation for the continuation value which can be compared to the early exercise value. Option values at different stopping times of all paths can then be evaluated, so can be the portfolio value as well as its VaR. Details of the algorithm for LLSM is summarized in Algorithm 1.

For the remainder of this section, we first introduce all notation needed for our subsequent discussion. As our VaR estimation procedure is developed upon prices evaluated from simulation, we first present the results of valuation in Section 2.2, upon which VaR convergence can then be established; see Section 2.3.

Preliminaries and Notation

Since the evaluation of $t_{1}$ -day VaR depends on the estimate of portfolio value at $t_{1}$ , which is derived from the portfolio values at stopping times $T_{j}$ for $j=1,\ldots,L$ . To guarantee the convergence of VaR at $t_{1}$ , we first develop the convergence results for product prices at stopping times $T_{j}$ ’s.

Assume an underlying complete probability space ( $\Omega$ , $\mathscr{F}$ , $\mathbb{P}$ ) and finite time horizon ([math], $T$ ), where $\Omega$ denotes the set of all possible realizations of the stochastic economy from time [math] to $T$ , $\mathscr{F}\triangleq\sigma(\Omega)=\mathscr{F}_{T}$ is the total information filtration accumulated up to $T=T_{L}$ with $T$ as the maximum maturity of all financial products in the portfolio. We discretize the time horizon into intervals $(T_{j-1},T_{j})$ for $j=1,\ldots,L$ with equal length $\Delta t=T_{j}-T_{j-1}$ small enough so that potential exercise dates in the portfolio can be represented by some discrete time points $T_{j}$ . Without loss of generality, we assume $T_{j}$ for $j=1,\ldots,L$ are the associated stopping times. Accordingly, we let $\mathscr{F}_{j}$ denote the information filtration up to time $T_{j}$ . Denote $Z_{j}$ as the adapted payoff process of the portfolio and assume that $Z_{j}$ are square-integrable random variables for all $j$ . At $T_{j}$ , we let $\{X_{j}\in\mathbb{R}^{p_{j}}\mid X_{j}=\big{(}X_{j1},\ldots,X_{jp_{j}}\big{)}^{\top}\}$ be the $p_{j}$ underlying stochastic variables in the portfolio. As implied by our notation, the number of underlying stochastic variables at different $T_{j}$ is not necessarily fixed. One example is a portfolio which consists of interest rate products whose payoffs are functions of forward rates. For simplicity, we assume that $p_{j}\equiv p$ for $j=1,\ldots,L$ , and given $X_{j}$ , there exists a deterministic payoff function $f$ such that $Z_{j}=f(T_{j},X_{j})$ . The function $f$ can be nonlinear and/or discontinuous. Finally, we let $\mathcal{T}_{j,k}$ be the set of all possible stopping times $\{T_{j},..,T_{k}\}$ . Defined as the portfolio value at $T_{j}$ , $U_{j}$ can be expressed in a form of conditional expectation as:

[TABLE]

where $\mathbb{Q}$ is a risk-neutral measure. In the sequel, the notation $\mathbb{Q}$ will be suppressed for the sake of simplicity. To illustrate the idea more effectively, we assume that there is only one optimal stopping time to be identified. If there is more than one derivative in the portfolio with different optimal stopping times, we may perform similar analysis by separating the portfolio into a linear combination of several elements, each of which has only one optimal stopping time that needs to be studied.

The formulation of the portfolio value $U_{j}$ defined in (2) considers a fairly general setup and covers a wide range of assets in the market. Our goal is to obtain an accurate estimate of $100(1-\alpha)\%$ $t_{1}$ -day VaR, where $\alpha\in(0,1)$ is typically set to be $0.01$ or $0.05$ . Assume, without loss of generality, that $t_{1}\in(T_{0},T_{1})$ and that $T_{0}$ is the current time point at which $U_{0}$ is observed constant. If $t_{1}=T_{1}$ , then we refer the VaR as VaR at a possible stopping time or else we refer it as VaR at a non-stopping time in general. In practice, most of the VaR’s considered belong to the latter type.

The $100(1-\alpha)\%$ $t_{1}$ -day VaR is based on the estimation of portfolio value at future time point $t_{1}$ . If $t_{1}=T_{1}$ , $U_{t_{1}}$ can be computed through (2); if $t_{1}\in(T_{0},T_{1})$ , $U_{t_{1}}$ is defined as

[TABLE]

Following classical optimal stopping theory Neveu (1975), we introduce the Snell envelope and rewrite (2) as

[TABLE]

or equivalently as

[TABLE]

If we define $\tau_{j}$ is the optimal stopping time after $T_{j}$ , then $\tau_{j}:=\min\{k\geq j\mid U_{k}=Z_{k}\}$ in which case we can rewrite $U_{j}=\mathbf{E}(Z_{\tau_{j}}\mid\mathscr{F}_{j}),~{}j=0,1,\ldots,L$ .

A backward approach is adopted to determine the optimal stopping time for each path. The rule can be stated by defining the dynamics of $\tau_{j}$ as,

[TABLE]

where $\mathbf{1}_{\{\cdot\}}$ denotes the indicator function. Assume there is an $\mathscr{F}_{j}$ -Markov chain $\{X_{j}\}$ , $j=1,\ldots,L$ , such that $Z_{j}=f(j,X_{j})$ for some Borel functions $f(j,\cdot)$ ; then we have $U_{j}=g(T_{j},X_{j})$ for some function $g(j,\cdot)$ and $\mathbf{E}(Z_{\tau_{j+1}}\mid\mathscr{F}_{j})=\mathbf{E}(Z_{\tau_{j+1}}\mid X_{j})$ for $j=0,1,\ldots,L$ . Note that in practice, $X_{0}$ and $U_{0}$ are both deterministic.

Denote $\{L_{m}(X_{j})\}_{m\geq 1}$ as a sequence of measurable real-valued functions that serves the basis functions in the regression models. To numerically evaluate $\{\mathbf{E}(Z_{\tau_{j}})\}$ , $j=1,2,\ldots,L$ through a Monte Carlo procedure, we can simulate $N$ independent paths of the underlying risk factors of the Markov chain $\{X_{j}\}$ . We define $X_{j}^{[i]}=(X_{j1}^{[i]},\ldots,X_{jp}^{[i]})^{\top}$ as the independent realizations of underlying stochastic variables at time $j$ for the $i$ -th simulated path and $Z_{j}^{[i]}$ as the associated payoff for $j=1,2,\ldots,L$ ; $i=1,2,\ldots,N$ with $Z_{j}^{[i]}=f(T_{j},X_{j}^{[i]})$ .

In an attempt to approximate the conditional expectation $\mathbf{E}(Z_{\tau_{j+1}}\mid X_{j})$ via a finite number of basis functions of $X_{j}$ , we impose the following two conditions that appear in Clement et al. (2002):

(A1)

For $j=t_{1},1,\ldots,L-1$ , the sequence $\{L_{m}(X_{j})\}_{m\geq 1}$ is total in $\mathcal{L}^{2}\{\sigma(X_{j})\}$ , where $\mathcal{L}^{2}\{\sigma(X_{j})\}$ denotes the $\mathcal{L}_{2}$ -space spanned by $\sigma(X_{j})$ . 2. (A2)

For $j=t_{1},1,\ldots,L-1$ , if $\sum_{m=1}^{M}a_{m}L_{m}(X_{j})=0$ a.s., then $a_{m}=0$ for $m=1,\ldots,M$ , where $M$ denotes the number of basis functions included in the model.

Under these two conditions, we can obtain coefficients vector $a_{j}^{[M]}$ such that

[TABLE]

where $L^{[M]}(X_{j})=(L_{1}(X_{j}),\ldots.,L_{M}(X_{j}))^{\top}$ . To estimate the coefficients $a_{j}^{[M]}$ , we assume

[TABLE]

where $\varepsilon_{j}$ is the error term. $a_{j}^{[M]}$ is known as the true coefficients in the regression. In line with the classical regression analysis, the gram matrix is defined as

[TABLE]

We also define stopping times $\tau_{j}^{[M]}$ estimated by $M$ basis functions as

[TABLE]

Likewise, $\tau_{j}^{[i,M]}(j=1,\ldots,L)$ is used to denote the estimated stopping time with true coefficients in the regression for the $i$ -th path. The estimated stopping time with LASSO estimated coefficients $a_{j}^{[M,N]}$ for the $i$ -th path is denoted by $\tau_{j}^{[i,M,N]}$ , where $a_{j}^{[M,N]}$ is defined as

[TABLE]

with the penalty $\lambda$ depends on $M$ and $N$ . In the sequel, we suppress the notation $\lambda^{[M,N]}$ for clearer presentation. Determining the optimal value for the regularization parameter is vital in terms of ensuring that the model performs well; typically, it is chosen by cross-validation. Our numerical procedure also adopts this approach for selecting a reasonable penalty.

To distinguish LASSO estimators from ordinary least-squares (OLS) estimators, we asterisk the associated symbols for all the parameters related to LSM. Accordingly, we have

[TABLE]

for the LSM approach. Based on the definition of estimated stopping times, we can define the portfolio value in (2) explained by $M$ basis functions with true coefficients as

[TABLE]

If we substitute $a_{j}^{[M,N]}$ into $a_{j}^{[M]}$ in the definition of $U_{j}^{[M]}$ , we can obtain $U_{j}^{[M,N]}$ , which is the portfolio value estimated by LLSM with $M$ basis functions and $N$ sample paths.

The following two subsections present the main contribution of this paper. Our first step is to establish the convergence result for valuation in Section 2.2. Upon these consistent estimates of the derivative prices, the corresponding rates of convergence of VaR estimates are discussed in Section 2.3. Despite the fact that techniques of handling high-dimensional data have been actively studied for the past two decades, to the best of our knowledge, there has not yet been any similar development in pricing/risk measure literature. All the new theorems presented subsequently compare the convergence rates for the traditional LSM and our proposal LLSM. The benefits of incorporating LASSO in the framework lies on the size of $M$ , the number of basis functions, that can be handled by the model. Traditional methods like LSM performance can be significantly hindered when the dimension of the covariates grows, which in turns leads to non-invertibility of the associated gram matrix. Selection of basis functions are also conducted in a rather subjective manner. Our main result, Theorem 4, points out that when the number of sample paths is not significantly larger than the number of basis functions considered, the LSM approach can be outperformed by the new proposal.

Convergence Results for Valuation

To prove the convergence of a VaR estimate, we first establish the convergence result for valuation. The ultimate goal of valuation convergence is to prove

[TABLE]

Similar to the treatment adopted in Clement et al. (2002), the convergence (6) can be established based on the two results of $\lim_{M\to\infty}U_{j}^{[M]}=U_{j}$ and $\lim_{N\to\infty}U_{j}^{[M,N]}=U_{j}^{[M]}$ for any fixed $M$ . In particular, assume Condition (A1) is satisfied, for $j=1,2,\ldots,L$ , Clement et al. (2002) shows that

[TABLE]

This result ensures the payoff $U_{j}^{[M]}$ estimated by regression on $M$ basis functions will converge to the true payoff $U_{j}$ as the number of basis functions $M$ tends to infinity. It is a consequence due to the total property of $L^{2}\{\sigma(X_{j})\}$ .

The next theorem stipulates that, under the same conditions that ensure valuation convergence of LSM, LLSM can achieve same rate of convergence for valuation at $T_{j}$ for $j=1,\ldots,L-1$ . In other words, if the singularity problem can be solved through increasing $N$ , the introduction of LASSO will not slow down the rate of convergence. Meanwhile, it suggests that under a weaker constraint on the singularity of the gram matrix, the almost sure convergence still holds for $U_{j}^{[M,N]}$ . To examine the convergence of $U_{j}^{[M,N]}$ to $U_{j}^{[M]}$ , three additional conditions are required:

(A3)

For $j=1,2,\ldots,L-1$ , $i=1,2,\ldots,N$ , realizations of $\epsilon_{j}$ in (4) are i.i.d. with zero mean and finite variance. 2. (A4)

For $j=1,2,\ldots,L-1$ , there exists a non-singular $M\times M$ matrix $C_{j}$ such that the gram matrix $A_{j}^{[M,N]}$ defined in (5) converges to $C_{j}$ as $N\to\infty$ . 3. (A5)

(Compatibility Condition) Define the active set $S_{0}=\{m;a_{jm}^{[M]}\neq 0,m=1,2,\ldots,M\}$ . The compatibility condition is met for the set $S_{0}$ , if for some $\phi_{0}>0$ and for all $a^{[M]}$ satisfying $\|a^{[M]}_{S_{0}^{c}}\|_{1}\leq 3\|a^{[M]}_{S_{0}}\|_{1}$ , it holds that $\|a^{[M]}_{S_{0}}\|_{1}^{2}\leq\{a^{[M]}\}^{\top}A_{j}^{[M,N]}\{a^{[M]}\}\frac{s_{0}}{\phi_{0}^{2}}$ , where $s_{0}=$ card $(S_{0})=|S_{0}|$ .

Theorem 1.

Assume for $j=1,2,\ldots,L-1$ , $\Pr\{a_{j}\cdot L^{[M]}(X_{j})=Z_{j}\}=0$ and that Conditions (A1), (A2) and (A3) are satisfied. The LASSO estimators $a_{j}^{[M,N]}$ are obtained under the penalty with $\lambda=\mathcal{O}(\log M/N)$ and $\lambda/N=o(1)$ .

(i)

If Condition (A4) holds, then $U_{j}^{[M,N]}$ converges to $U_{j}^{[M]}$ almost surely. 2. (ii)

If Condition (A5) holds for the active set, then $U_{j}^{[M,N]}$ converges to $U_{j}^{[M]}$ almost surely also.

Proof.

Details of the proof can be found in \hyperref[sect:A1]Appendix A.1. ∎

Remark 1.

The assumption $\Pr\{a_{j}\cdot L^{[M]}_{j}(X_{j})=Z_{j}\}=0$ is also required in Clement et al. (2002). To see the difference between LSM and LLSM, we observe that Theorem 1 (i) also holds for $U_{j}^{*[M,N]}$ in LSM, but Theorem 1 (ii) does not because without proper regularization, the associated gram matrix of the regression model in LSM will become singular.

Remark 2.

A similar version of Condition (A3) is also imposed in Clement et al. (2002). The definition of $a_{j}^{*[M,N]}$ in (2.11) of Clement et al. (2002) assumes the gram matrix is invertible by default. If we adopt a more general definition of $a_{j}^{*[M,N]}$ that allows estimation error and takes the singularity problem into account, Condition (A4) is necessary for LSM. This condition is, however, rather restrictive since it requires the invertibility the gram matrix. The almost sure convergence property can still be maintained for the LLSM estimates even if we replace Condition (A4) with a less stringent constraint on the eigenvalues of the gram matrix. The Compatibility Condition (A5) (see also (6.4) of Bühlmann and van de Geer (2011)) is similar to a constraint on the smallest eigenvalue of the gram matrix. This standard LASSO condition is a weaker condition which can be implied by Condition (A4). More discussion of the Compatibility Condition can also be found in Bickel et al. (2009); Koltchinskii (2009b) and Koltchinskii (2009a) amongst others.

In Theorem 1, the additional LASSO component allows a substantially larger number of basis functions to be included in the model without corrupting the convergence of the estimated coefficient in the active set; see Bühlmann and van de Geer (2011); Zhao and Yu (2006). We shall also see in Theorem 4 the magnitude of $M$ that ensures convergence under this LASSO framework. Furthermore, the variable selection step in our model reduces the coefficient instability due to multicollinearity.

By (7) and Theorem 1, we can see that the ultimate valuation convergence goal (6) can be achieved almost surely in the following sense:

[TABLE]

One may notice that the above induction may not be as straightforward as it appears because the value of $M$ is restricted by the choice of $N$ . In fact, (6) remains valid for some sufficiently large, yet finite, $M$ , given that the $L^{2}\{\sigma(X_{j})\}$ space is spanned by a finite number of basis functions. When the space $L^{2}\{\sigma(X_{j})\}$ is spanned by a finite number of basis functions $L^{[M]}(X_{j})$ , the approach that can correctly choose all the unknown basis functions spanning $L^{2}\{\sigma(X_{j})\}$ is desirable. If some of the necessary basis functions are excluded, convergence will never be obtained even when $N$ tends to infinity; on the other hand, if unnecessary basis functions are included, the increase in the number of coefficient parameters in the model may be poor due to numerically instability, eventually resulting in erroneous VaR estimates. The following theorem guarantees that LLSM can include more basis functions in the regression model than LSM for the same rate of convergence of the asset value.

Theorem 2.

Suppose the conditions in Theorem 1 are satisfied and the Irrepresentable Condition in the sense of Zhao and Yu (2006) holds for the active sets, $|S_{0}|=s_{0}<\infty$ for $j=1,\ldots,L-1$ ; see also Appendix for the definition of Irrepresentable Condition. If a finite set of $M_{1}$ basis functions are initially included in the regression with $M_{1}$ sufficiently large so that $S_{0}\subseteq S_{0}^{[M_{1}]}$ , then there exists $M\leq M_{1}<\infty$ such that,

[TABLE]

Proof.

Details of the proof can be found in \hyperref[sect:A2]Appendix A.2. ∎

Theorem 2 ensures that, given a suitable penalty $\lambda$ , one can carry out the valuation procedure with finite number of basis functions and obtain the same convergence result as $N$ increases. Furthermore, the number of basis functions considered in LLSM never exceeds that considered in LSM for the same convergence result based on the same initial set of basis functions. The Irrepresentable Condition is a stronger condition that implies the compatibility Condition. It depends on the gram matrix and the signs of true coefficients; see Bühlmann and van de Geer (2011) for more discussion.

The above result also concludes that the number of basis functions needed to obtain convergence in LLSM is upper bounded by that required by LSM. Fewer basis functions in the regression model implies that there will be less estimation error given the same computation budget. Admittedly, there is no guarantee that one can include all the influential basis functions that span $L^{2}\{\sigma(X_{j})\}$ in the regression model. Nonetheless, given the same computation budget $N$ , LLSM enables users to initially include and screen more basis functions; see also Theorem 4.

Convergence Results for VaR

Given the valuation convergence results presented in Section 2.2, we now establish the corresponding convergence properties of the VaR estimate proposed. As discussed earlier, the properties of a $t$ -day VaR with $t$ as a stopping time are different from cases where $t$ is not a stopping time. In this section, we present Theorem 3 which ensures the convergence of VaR at possible stopping times. The specific rates of convergence of VaRs at non-stopping times evaluated via LSM and LLSM are derived in Theorems 6 and 4 respectively.

Theorem 3.

For $j=1,\ldots,L-1$ , if conditions in Theorem 1 (i) are satisfied, then

[TABLE]

where $\text{VaR}_{j}^{[M,N]}$ and $\text{VaR}_{j}^{[M]}$ are defined as,

[TABLE]

Proof.

Details of the proof can be found in \hyperref[sect:A3]Appendix A.3. ∎

Remark 3.

This theorem also holds for $\text{VaR}_{j}^{*[M,N]}$ derived from LSM. A similar convergence result still holds for $\text{VaR}_{j}^{[M,N]}$ if we substitute the Compatibility Condition, a weaker condition, for Condition A4. It is, however, not true for $\text{VaR}_{j}^{*[M,N]}$ .

Theorem 3 proves the convergence of VaR estimates by LLSM at stopping times. Both $\text{VaR}_{j}^{[M,N]}$ and $\text{VaR}_{j}^{*[M,N]}$ converge at the rate of $\mathcal{O}(N^{-1})$ ; c.f. Proposition 3.2 of Bauer et al. (2012). However, in most cases, we need the convergence result for $t_{1}$ -day VaR with a non-stopping time $t_{1}$ . In a typical setting, for instance, a risk manager has to compute a $10$ -day VaR in order to fulfill the Basel II regulations. In this case, $t_{1}=10$ -day and $t_{1}\notin\mathcal{T}_{0,T}$ ; the convergence of $\text{VaR}^{[M,N]}_{t_{1}}$ to the $\text{VaR}^{[M]}_{t_{1}}$ is obviously important. To achieve this, we provide Theorems 6 and 4 which guarantee that, under some mild conditions, VaR estimates by LLSM at non-stopping times converge at a faster rate than the counterparts obtained by LSM. This theorem explains why LLSM always outperforms LSM when we compute $95\%$ $10$ -day VaR in our numerical studies.

To handle calculations related to non-stopping time, we write the estimate of $Z_{\tau_{1}}$ as a combination of basis functions, viz.

[TABLE]

where $a_{t_{1}}^{[M]}$ is referred to the true coefficients in the regression at $t_{1}$ and $\epsilon_{t_{1}}$ denotes the error term with zero mean and finite variance. Note that $Z_{\tau_{1}^{[M]}}$ serves as the response in the regression, indicating that true coefficients are used in each regression to estimate $\tau_{1}^{[M]}$ . The LASSO estimates are defined correspondingly as $a_{t_{1}}^{[M,N]}=\operatorname*{arg\,min}_{\alpha\in\rm I\!R^{M}}\left\{\|Z_{\tau_{1}^{[M,N]}}-\alpha\cdot L^{[M]}(X_{t_{1}})\|_{2}^{2}+\lambda\|\alpha\|_{1}\right\}$ , where $Z_{\tau_{1}^{[M,N]}}$ is the response in the regression. The true coefficients in the same regression is defined as $\tilde{a}_{t_{1}}^{[M,N]}$ . The corresponding OLS estimates, namely $a_{t_{1}}^{*[M,N]}$ and $\tilde{a}_{t_{1}}^{*[M,N]}$ , can be obtained by substituting $Z_{\tau_{1}^{*[M,N]}}$ with $Z_{\tau_{1}^{[M,N]}}$ as the response in the regression.

The pricing error at $t_{1}$ is composed of two components. One is the estimation error that comes from the regression at $t_{1}$ , denoted by $\bigg{|}(a_{t_{1}}^{[M,N]}-\tilde{a}_{t_{1}}^{[M,N]})\cdot L^{[M]}(X_{t_{1}})\bigg{|}$ ; the other is the estimation error of $Z_{\tau_{1}^{[M]}}$ , denoted by $\bigg{|}N^{-1}\sum_{i=1}^{N}(Z_{\tau_{1}^{[i,M,N]}}^{[i]}-Z_{\tau_{1}^{[i,M]}}^{[i]})\bigg{|}$ with the superscript $i$ in this notation indicates the $i$ th realization of the corresponding random variables. Although both $a_{t_{1}}^{[M]}$ and $\tilde{a}_{t_{1}}^{[M,N]}$ are called true coefficients, different responses are used as dependent variables in the corresponding regression. Due to the fact that the definition of $U_{t_{1}}^{[M]}$ is different from that of $U_{j}^{[M]}$ , $j=0,\ldots,L$ , we cannot trivially apply Theorem 3 to the proof of VaR convergence at $t_{1}$ .

To tackle this problem, we define

[TABLE]

as the average pricing error for LASSO and OLS, respectively. We also define $W=\sqrt{N}\bar{W}$ and $W^{*}=\sqrt{N}\bar{W}^{*}$ . Let $g_{N}(\cdot,\cdot),g(\cdot)$ and $g_{N}(\cdot)$ denote the joint pdf of $U_{t_{1}}^{[M]}$ and $W$ , the marginal pdf of $U_{t_{1}}^{[M]}$ and the pdf of $U_{t_{1}}^{[M,N]}$ , respectively. To ensure the VaR convergence for the nested simulation and for LSM, the following condition that imposes some restriction on the distribution of $W$ and $W^{*}$ is required; see Gordy and Juneja (2010) and Bauer et al. (2012).

We say that Condition (A6) holds for random variable $W$ if both of the following are satisfied:

i.

The joint pdf $g_{N}(\cdot,\cdot)$ of $U_{t_{1}}^{[M]}$ and $W$ and its partial derivatives $\frac{\partial}{\partial u}g_{N}(u,w)$ , $\frac{\partial^{2}}{\partial u^{2}}g_{N}(u,w)$ exist for each $N$ and for all sets of $(u,w)$ . 2. ii.

For $N\geq 1$ , there exist non-negative functions $p_{0,N}(\cdot)$ , $p_{1,N}(\cdot)$ , $p_{2,N}(\cdot)$ such that for all $(u,w)$ ,

[TABLE]

In addition,

[TABLE]

This condition generally holds for large portfolios where there are at least a few positions that have sufficiently smooth payoffs; see Gordy and Juneja (2010). To compare the performance of LLSM and LSM, we introduce Theorem 4 that shows the convergence rate of $\text{VaR}_{t_{1}}^{[M,N]}$ and $\text{VaR}_{t_{1}}^{*[M,N]}$ .

Theorem 4.

If conditions in Theorem 1 (i) are satisfied, Condition (A6) holds for $W$ and $W^{*}$ , $\text{VaR}_{t_{1}}^{[M,N]}$ by LLSM $\text{VaR}_{t_{1}}^{*[M,N]}$ by LSM will converge to $\text{VaR}_{t_{1}}^{[M]}$ in the following sense,

[TABLE]

*where $v=\text{VaR}_{t_{1}}^{[M]}-U_{0}$ and $\tilde{v}\in[v-w/\sqrt{N},v]$ . Furthermore,

if $N=o\left(\frac{M^{2}\phi_{0}^{2}}{s_{0}\log M}+2M+\frac{s_{0}\log M}{\phi_{0}^{2}}\right)$ , we will have $\frac{\text{VaR}_{t_{1}}^{[M,N]}-\text{VaR}_{t_{1}}^{[M]}}{\text{VaR}_{t_{1}}^{*[M,N]}-\text{VaR}_{t_{1}}^{[M]}}=o(1).$

Proof.

Details of the proof can be found in \hyperref[sect:A4]Appendix A.4. ∎

Remark 4.

*Theorem 4 still hold if we substitute the Compatibility Condition for Condition (A4). Note that in this case, $\text{VaR}_{t_{1}}^{[M,N]}$ will still converge whereas $\text{VaR}_{t_{1}}^{*[M,N]}$ will diverge.

As we can see in this theorem, LLSM allows us to include $o(\exp(N))$ basis functions whereas LSM can only handle at most $o(N)$ for convergence. If the gram matrix is non-singular, LLSM yields a faster VaR convergence rate than LSM under restriction of $N=o\left(\frac{M^{2}\phi_{0}^{2}}{s_{0}\log M}+2M+\frac{s_{0}\log M}{\phi_{0}^{2}}\right)$ . Such a growth rate of $N$ can be explained in the following two aspects. Firstly, this choice of $N$ means that the number of sample paths available cannot be infinitely large due to a given computation budget. Under a high-dimensional setting with $M$ large, $N$ can hardly be larger than $\mathcal{O}(M^{2})$ . Secondly, if we have enough resources so that $N>\mathcal{O}(M^{2})$ , the LASSO component may not be necessary given the non-singularity of the gram matrix and abundant sample paths. LASSO has been well-known for its application in high-dimensional statistics, but bias would arise if we impose a penalty in the minimization process in an unnecessary case when $N$ is sufficiently large and the gram matrix is non-singular.

Numerical Studies

Our quantity of interest is the $95\%$ $10$ -day VaR for portfolios with nonlinear payoffs. Back testing is performed to evaluate the performance of different approaches when oracle benchmarks are available. In this section, the penalty used in LASSO is determined by 20-fold cross-validation to minimize the mean cross-validated error given a loss function. We refer the nested simulation in Gordy and Juneja (2010) as the estimated oracle approach. If, in the inner simulation, a closed form solution is available for evaluating the portfolio at $t_{1}=10$ -day, we define the approach as the true oracle approach. The Greeks involved in the delta-normal approach are computed numerically via center finite difference method.

Although we consider VaR estimation of individual products, the idea of VaR evaluation can be extended from a single derivative to a high-dimensional portfolio by including additional risk factors as the underlying stochastic variables in the regression. Common risk factors are simulated once and only one regression will be performed at each possible stopping times and $t_{1}$ to evaluate the value of the whole portfolio. Specifically, to make the results more directly comparable with those presented in Longstaff and Schwartz (2001), we adopted polynomials up the three order as our basis functions $L(X)$ for all examples. In the following examples, we shall assume that the return series follow multivariate Gaussian distributions. They are constructed in this way such that we can easily benchmark our performance with existing procedures, especially those which rely on the closed-form solutions under such settings. Noteworthy, however, our formulation does not require joint normality assumption for the return series. Because of the non-parametric nature of our estimate, our proposal can be extended to non-elliptical world fairly easily because of the ranking step stated in Step 13 in Algorithm 1.

Rainbow Option

Rainbow options are one of the most commonly traded exotic options whose payoff functions depend on more than one underlying risky assets. In this section, we consider a variation of “call on min” rainbow option with ten stocks as its underlying risky assets. The long side will receive a positive profit if the minimum ratio return of ten underlying stocks exceeds a predefined strike price. In other words, the payoff at maturity is expressed as

[TABLE]

where $S_{i0}$ denotes the current price for the $i$ th underlying stock. The constant $100$ in the payoff function is arbitrary for illustration to standardize the payoff at maturity. In order to derive a benchmark based on the closed form solution for pricing, we assume the underlying stock prices follow the Black and Scholes (1973) model. The closed form solution is discussed in Johnson (1987). Corresponding details are provided in the Appendix; see Section B2.

The VaR estimates given by different approaches are summarized in Table 1. The strike is selected to ensure the rainbow option is at-the-money, a situation in which delta-normal approximation may face challenges due to non-differentiability at the price that corresponds to unit moneyness. We chose the maturity $T$ to be 270 days in this example. The number of sample paths generated in each approach is $N=10,000$ and the number of paths in the inner layer of the estimated oracle approach is $N_{2}=50,000$ .

Since we can obtain one estimate of VaR in the estimated oracle approach, there is no observation of the standard deviation. Except for the oracle approaches, each methodology is repeated for 500 iterations in order to study the distribution of the VaR estimates. Procedures labelled with † adopt the closed form solution for all the pricing involved.

The computation time indicates the time needed for an approach to obtain one VaR estimate yielded from a computer with Intel Core i5-5200U, CPU 2.2 GHz and RAM 8GB.

As shown in Table 1, only a small amount of additional computation is required to carry out the variable selection, even though 20-fold cross validation is adopted for LLSM. Upon our VaR estimates, the back testing procedure was carried out by comparing the estimates with the unrealized P&L’s of the simulated prices evaluated based on the closed form formulas. Percentages of losses that exceed the VaR estimates are reported. According to Table 1, we can see that it is worthwhile to carry out the additional LASSO variable selection procedure since the back testing results are dramatically improved from $2.90\%$ to $4.42\%$ . For a fair comparison, both Delta-normal and Delta-gamma approaches apply the finite difference method for Greeks calculations. We observe biased estimates for Greeks with higher orders and significantly heavier computational burden as the number of Greeks increases. The back testing results of $3.29\%$ and $0\%$ in the Delta-normal and Delta-gamma approach can be improved to $5.23\%$ and $6.18\%$ respectively if the closed form solution is applied to Greeks computing. The Delta-gamma approach has poorer performance because of the biases accumulated in repeated numerical approximations of the differentials.

These results verify that even with a short horizon, neither first nor second-order approximations is insufficient for estimating VaR’s of derivatives with nonlinear payoffs. The discrepancy is even more prominent when the derivatives are nearly at-the-money.

European Swaption

Swaptions are among the most liquidly traded interest rate derivatives in the financial market.

Consider a European payer $20$ NC (“non-call/lock-out” period) $2$ swaption whose underlying swap has a final tenor of 20 years. We adopt the Lognormal Forward LIBOR Model (LFM) as the underlying model for the forward rates in the swaption. Same definitions and calibrations are adopted from Brigo and Mercurio (2007). Denote $L(t,T)$ as the spot interest rate prevailing at time $t$ for the maturity $T$ and $P(t,T)$ as the zero-coupon bond price delta-normalat time $t$ with payment at maturity $T$ . The forward rates are denoted by $L_{i}(t)\equiv L(t,T_{i-1},T_{i})$ , where $i=1,\ldots,20$ . The forward rates dynamics in the LFM are defined in Proposition 6.3.1 in Brigo and Mercurio (2007).

Given a notional amount of $N=1,000$ and the swap rate $K$ , the payoff to the holder at $T_{i}$ is

[TABLE]

where $i=2,\ldots,20$ , $\delta_{j}=\delta(T_{j-1},T_{j})$ is the discrete time interval, $D(T_{i},T_{j})$ is the discount factor for time period of $(T_{i},T_{j})$ and $\mathbb{Q}^{i}$ is a forward-adjusted measure corresponding to time $T_{i}$ . More details about the model and parameters calibration can be found in the Appendix.

In the numerical study of swaption in Longstaff and Schwartz (2001), the basis functions are subjectively selected to be a constant, the first three powers of the discounted price of the swaption at $t$ , and the first power of all immatured zero coupon bond prices with final maturity dates up to and including $T_{20}$ . We refer LSM with subjectively selected basis functions as SLSM. This method can potentially be unreliable as it performs a subjective apriori variable selection. For general products with a large number of underlying assets across different asset classes, the selection may not be as straight forward as the case for swaption.

We denote GLSM as LSM that specifically includes the first three orders of risk factors and second order of cross terms of these risk factors in the regression model. Note that GLSM does not include cross terms up to third order as in LSM. We allow this loose restriction on the order of basis functions to avoid that LSM fails to get OLS coefficient estimates due to over-parameterization.

The swap rate of the underlying swap is determined at $T_{0}$ to guarantee the swaption at-the-money. The numbers of sample paths in each approach are $N=5,000$ . The number of paths in the outer layer and inner layer is $N_{1}=30,000$ and $N_{2}=30,000$ respectively. To ensure the estimated oracle approach offers a stable estimation, we have examined and selected different number of intensive simulation paths. We choose sufficient large $N_{1}$ and $N_{2}$ so that no significant change is observed with any further increment. Four approaches except the oracle approach are repeated $500$ times to get sample statistics. The computation time indicates the mean time needed for carrying out one round of iteration.

As shown in Table 2, the computation time needed for the delta-normal approach is significantly longer than other approaches. This is due to the fact that the best effort available to evaluate the portfolio value at $T_{0}$ is the estimated oracle approach. Nested simulation is required for each shift in each of the $18$ underlying risk factors at $T_{0}$ for the delta-normal approach. The application of the estimated oracle approach is rather limited due to its computational burden: Even for a European swaption, it demands approximately three days to calculate one estimate of VaR.

The standard deviations for the first three methods are close but significantly larger than that obtained from the delta-normal approach. Despite the small standard deviation of the estimates given by the delta-normal approach, it incurs rather large biases which cast doubt on the accuracy of its performance. The boxplot shown in Figure 1 summarizes the distribution of the VaR estimates obtained by the first four approaches. The dots in each approach represent VaR estimates in 500 experiments. The dash line draws the VaR obtained by the estimated oracle approach.

Among these five methods, GLSM performs worst. For the delta-normal approach, it produces estimates with a smaller bias, but with abnormally small variance. In the $500$ experiments, no results from the delta-normal approach or GLSM produces VaR estimate that is close to the oracle VaR. For SLSM, the dash line is located beyond the $25\%$ quantile of the distribution, indicating that this approach still has a small probability if getting an accurate VaR in one experiment. Regarding LLSM, the median of the distribution is closer to the dash line, indicating that the bias is small. Variance of this approach is also reasonable, in the sense that the dash line crosses the distribution within the range of 25% and 75% quantiles.

The performance can be evaluated through the back testing result summarized in Table 2. Consistent with the analysis depicted in Figure 1, GLSM severely overestimates VaR, resulting a back testing result of [math]. The SLSM and the delta-normal approach have similar biases and similar back testing results of around $2\%$ . Their back testing results are not satisfactory either because the estimated VaRs are too conservative, which consequently requires extra unnecessary capital reserves. LLSM, although underestimates VaR, performs much better with a back testing result of $5.05\%$ . Overall, LLSM offers the best performance among the four approaches.

Bermudan swaption

Since LLSM is applicable to portfolios with American features, we extend the previous example to Bermudan swaptions. Consider a Bermudan payer 20 NC 2 swaption. The payoff to the holder at $T_{i}$ , $i=2,\ldots,20$ is defined as (8). Each approach is repeated for $100$ times. Since it is not practical to perform nested simulation to derive oracle initial value, we applied SLSM with sufficiently large number of paths to determine the initial value of the swaption. Other settings are the same as in the previous study.

As shown in Table 3, the computation time for the delta-normal approach is significantly larger than other approaches due to re-valuations required for each shift in the underlying risk factors. SLSM is used in evaluating the portfolio value at $T_{0}$ in the delta-normal approach since it is the best effort available for swaptions with Bermudan feature In some iterations, some of the deltas are especially large, thus leads to inflated trails. As we can see in Figure 2, the VaR calculated from the delta-normal approach is heavily right-skewed with a large number of outliers, whereas the VaR from other four approaches appears to be symmetrically distributed with little outliers. The large standard deviation also indicates that the delta-normal approach lacks statistical efficiency.

In order to further investigate different performances of the approaches in estimating VaR, we examine valuation performance at the first tenor $T_{2}$ and $t_{1}$ and present the result in Table 4. The delta-normal approach is excluded as it does not involve pricing the swaption at $t_{1}$ and $T_{2}$ . Table 4 shows that the valuation at $T_{2}$ varies little among different approaches. This can be explained by Theorem 1, as well as the analytical result in Clement et al. (2002). Consistent with the belief that the fitted value of the regression with OLS estimators at $t_{1}$ deteriorates, the valuation of GLSM at $t_{1}$ is significantly different from other three approaches, which is probably an indication of poor valuation estimates at $t_{1}$ . It is also worth mentioning that, as reported in Table 4, the mean values of the swaption prices due to SLSM are close to those evaluated via LLSM. The variables selected by SLSM are chosen by experts with domain knowledge whereas LLSM can automatically include important variables in the regression model amongst a general pool of (polynomials of) covariates in an objective manner. For complicated/new products which are comprised of a vast number of underlying assets, it can be challenging even for practitioners to decide which covariates should be included in the pricing model; the LLSM procedure, on the other hand, can provide hints about which variables that are influential. In addition, although the mean values of the prices due to SLSM and LLSM agree, the corresponding distributions are different, which lead to different tail quantiles, hence the VaR estimates.

The boxplot on the right panel of Figure 2 displays the distribution of VaRs estimated via SLSM, GLSM and LLSM. The difference in the distribution of VaRs based on these four approaches indicates that the model selection component in LLSM indeed has a remarkable impact on the VaR values estimated. While the delta-normal method produces highly volatile VaR estimates in Figure 2, we can also see that the estimate produced by GLSM is substantially higher than that given by LLSM.

It is natural to think that the VaR for vanilla equity options should be larger as the number of available stopping times increases. However, the actual relation between VaR and the number of stopping times is more sophisticated for swaptions because their payoff functions that are determined by a large number of dependent underlying forward rate processes. We, therefore, present Table 5 which shows a decreasing VaR trend against the increase in the number of stopping times under our calibrated model. To seek a fair comparison, we adopt the same approach to estimate both the initial value and swaption values at $t_{1}$ in each column. Based on the decreasing trend observed, one may deduce that Bermudan swaption VaRs should be smaller than those of the oracle VaR of European swaptions. In Table 3, only LLSM produces VaR estimates smaller than the oracle VaR of European swaption in Table 2. Even there is no oracle benchmark for the study of Bermudan swaption, this observation, combined with the possible indication of poor valuation in GLSM and volatile estimates of the delta-normal approach, can justify that for the Bermudan case, LLSM still outperforms other contenders.

Conclusion

In this paper, we propose the LASSO Least-sqaures Monte Carlo (LLSM) approach as an extension of the Least-squares Monte Carlo (LSM) method for Value-at-Risk (VaR) evaluation of a portfolio. The introduction of LASSO in LLSM, which serves as a model selection technique, enables the proposal to handle high-dimensional and nonlinear portfolios with American features. While domain knowledge facilitates practitioners to select the influential risk factors with more confidence, LLSM offers an objective alternative which can be helpful especially for evaluating VaRs of new and complicated financial products. In this paper, we have also established the oracle properties of LLSM and developed convergence results for pricing and VaR evaluation. Numerical studies in rainbow options and swaptions show that LLSM outperforms other existing practices such as the delta-normal, delta-gamma approaches and LSM.

Although expected shortfall (ES), as a coherent risk measure (see, for instance, Gourieroux and Jasiak, 2002), will be implemented in Basel III, we would like to emphasize that an accurate, reliable estimate of VaR is an essential intermediate step for a sound ES estimation. Despite the fact that VaR will play a comparatively lesser role in risk management for the banking industry, it should be stressed that Solvency II, which is the current supervisory framework that has been enforced since 2016 for the insurance industry, makes use of VaR to calculate solvency capital requirement (SCR). On the other hand, as discussed in Kou and Peng (2016), the only type of risk measures that satisfy a set of economic axioms for the Choquet expected utility and the statistical property of general elicitability (i.e., there exists an objective function such that minimizing the expected objective function yields the risk measure) is the median shortfall, which is the median of tail loss distribution and is equivalent to the VaR at a higher confidence level. The use of VaR, therefore, does have its merits.

There are several possible extensions to this paper. Firstly, it is plausible to include historical simulation (HS) or filtered historical simulation (FHS), which are common practices in computing capital requirements in banking industry; see, for example, Gurrola-Perez and Murphy (2015), in our framework. Secondly, our discussion on VaR can also be extended to ES. Dantzig selector (see Candes and Tao, 2007) can also shown to be another feasible variable selection method. We shall discuss the corresponding treatment in a separate paper. Thirdly,

since the bias term dominates the inaccuracy of LLSM, we can reduce the estimation bias via an extra layer of extensive simulation. As $100(1-\alpha)\%$ $t_{1}$ VaR is directly affected by the estimate of the $\alpha$ smallest $U_{t_{1}}$ , a more accurate estimate of the quantile will be helpful to improve the performance of LLSM. After getting estimates of $U_{t_{1}}$ for $N$ scenarios, we can perform intensive simulation to obtain a more accurate estimate of the $\alpha$ smallest $U_{t_{1}}$ . This can be done by first finding the values of underlying assets corresponding to the $\alpha$ smallest estimate of $U_{t_{1}}$ as initialization, then intensively simulate $N_{2}$ sample paths under $\mathbb{Q}$ measure. A better estimate of the $\alpha$ smallest $U_{t_{1}}$ can be found by averaging the discounted payoffs at maturity. We have obtained promising preliminary results for this so-called the Intensive Lasso Least-squares Monte Carlo (ILLSM) approach. Further investigations will be discussed in a separate paper.

Acknowledgement

The authors would like to thank the editor, associate editor and the two anonymous referees for their constructive comments that substantially improve the manuscript. The second author is in part financially supported by Hong Kong Research Grant Council research grants ECS-24300514 and GRF-14317716.

Appendix A: Proofs of the convergence results

This appendix contains the proofs for the convergence results discussed in Sections 2.2 and 2.3.

A.1 Proof of Theorem 1

To prove Theorem 1, we need the following four lemmas.

Lemma 1.

Consider a linear regression model $Y=X^{\top}a+\varepsilon.$ If we have $n$ observations, let $y=(y_{1},\ldots,y_{n})^{\top}$ , $y^{m}=(y^{m}_{1},\ldots,y^{m}_{n})^{\top}$ , $x_{i}=(x_{1i},\ldots,x_{pi})^{\top}$ , $x=(x_{1},\ldots,x_{n})$ , $x^{(j)}=(x_{j1},x_{j2},\ldots,x_{jn})^{\top}$ , $a=(a_{1},\ldots,a_{p})^{\top}$ , $\varepsilon=(\varepsilon_{1},\ldots,\varepsilon_{n})^{\top}$ . $x_{i}$ , $y_{i}$ , $y_{i}^{m}$ are realizations of random variables $X$ , $Y$ , $Y^{m}$ , where $i=1,\ldots,n$ . Define

[TABLE]

Assume $\varepsilon_{1},\ldots,\varepsilon_{n}$ are i.i.d. with $E\varepsilon_{1}=0$ , $E|\varepsilon_{1}|<\infty$ , $y_{i}^{[m]}\overset{a.s.}{\to}y_{i}$ as $m\to\infty$ . If there exists a non-singular matrix $C$ such that $\frac{1}{n}\sum_{i=1}^{n}x_{i}x_{i}^{\top}\to C$ as $n\to\infty$ , $\frac{\lambda}{n}\to 0$ , then $\hat{a}_{n}^{m}\overset{a.s.}{\to}a$ as $n\to\infty$ and $m\to\infty$ .

Proof.

Recall that

[TABLE]

Hence, one can write

[TABLE]

Define $C_{n}=\frac{1}{n}\sum_{i=1}^{n}x_{i}x_{i}^{\top}$ , $W_{n}=\frac{1}{n}\sum_{i=1}^{n}x_{i}\varepsilon_{i}$ , $V_{n}=\frac{1}{n}\sum_{i=1}^{n}x_{i}(y_{i}^{m}-y_{i})$ and discard terms which do note involve $u$ , we get

[TABLE]

Let $\gamma_{0,n}$ to be the smallest eigenvalue of $C_{n}$ , $\gamma_{0}$ to be the smallest eigenvalue of $C$ . Then $\gamma_{0,n}\to\gamma_{0}$ as $n\to\infty$ , where $\gamma_{0}>0$ . Write $\|u\|=\sqrt{\sum_{j=1}^{p}u_{j}^{2}}=\|u\|_{2}$ , which is equivalent to $\ell_{2}$ norm. If we define

[TABLE]

then on the set $\mathscr{T}\cap\mathscr{T}_{2}$ , we have

[TABLE]

It follows that

[TABLE]

Fix $\lambda_{0}\in(0,1)$ , $\varepsilon^{*}\in(0,1)$ . Since $\frac{\lambda}{n}=o(1)$ and by Lemma 3.1 of Chatterjee and Lahiri (2011), $\frac{1}{n}\sum_{i=1}^{n}x_{i}\varepsilon_{i}\overset{p}{\to}0$ , there exists $n_{0}$ such that $\forall n\geq n_{0}$ , $\frac{\lambda}{n}\leq\lambda_{0}$ , $\gamma_{0,n}>\frac{1}{2}\gamma_{0}>0$ . On the set $\mathscr{T}\cap\mathscr{T}_{2}$ , for any $u\in\rm I\!R^{P}$ with $\|u\|>\frac{(6\lambda_{0}+4\varepsilon^{*})\sqrt{p}}{\gamma_{0,n}}$ , it follows that

[TABLE]

Since $f_{n}(0)=0$ , it follows that for $n\geq n_{0}$ , the minimum of $f_{n}(0)$ cannot be attained in the set $\{u:\enspace\|u\|>\frac{(6\lambda_{0}+4\varepsilon^{*})\sqrt{p}}{\gamma_{0,n}}\}$ , whenever $\mathscr{T}\cap\mathscr{T}_{2}$ holds. Hence, $\forall n\geq n_{0}$ , $\mathscr{T}\cap\mathscr{T}_{2}$ implies that

[TABLE]

In particular,

[TABLE]

Since $\lambda_{0}$ and $\varepsilon^{*}\in(0,\infty)$ are arbitrary, the proof is completed. ∎

Lemma 2.

If, for $k=j,\ldots,L-1$ , $a_{k}^{[M,N]}\overset{a.s.}{\to}a_{k}^{[M]}$ as $N\to\infty$ and $\Pr\{a_{k}^{[M]}\cdot L^{[M]}(X_{k})=Z_{k}\}=0$ , then for $i=1,2,\ldots,N$ , $Z_{\tau_{j}^{[i,M,N]}}^{[i]}\overset{a.s}{\to}Z_{\tau_{j}^{[i,M]}}^{[i]}$ .

Proof.

For $j=L$ , $Z_{\tau_{T}^{[i,M,N]}}^{[i]}=Z_{\tau_{T}^{[i,M]}}^{[i]}=Z_{T}^{[i]}$ . Proceed by induction on j. Assume for $k=j+1,\cdot\cdot\cdot,T-1$ , $Z_{\tau_{k}^{[i,M,N]}}^{[i]}\overset{a.s}{\to}Z_{\tau_{k}^{[i,M]}}^{[i]}$ , we want to prove $Z_{\tau_{j}^{[i,M,N]}}^{[i]}\overset{a.s}{\to}Z_{\tau_{j}^{[i,M]}}^{[i]}$ .

[TABLE]

because the first term is finite by induction. The second term is bounded by

[TABLE]

which is also finite as $\Pr\{Z_{j}^{[i]}-a_{j}^{[M]}\cdot L^{[M]}(X_{j}^{[i]})=0\}=0$ . Similarly, the third term can be proved to be finite. This completes the induction. Therefore, as $N\to\infty$ , $Z_{\tau_{j}^{[i,M,N]}}^{[i]}\overset{a.s}{\to}Z_{\tau_{j}^{[i,M]}}^{[i]}$ ∎

Lemma 3.

Assume for $j=1,2,\ldots,L-1$ , $\Pr\{a_{j}^{[M]}\cdot L^{[M]}(X_{j})=Z_{j}\}=0$ . Furthermore, Conditions (A1)-(A4) are satisfied. Then, for the LASSO estimators $a_{j}^{[M,N]}$ with penalty parameter $\lambda$ such that $\lambda/N=o(1)$ , we have $a_{j}^{[M,N]}\overset{a.s.}{\to}a_{j}^{[M]}$ as $N\to\infty$ .

Proof.

By Lemma 1, for $j=L-1$ , $a_{j}^{[M,N]}\overset{a.s.}{\to}a_{j}^{[M]}$ . We again proceed by induction on j. Assume for $k=j,\cdot\cdot\cdot,T-1$ , $a_{k}^{[M,N]}\overset{a.s.}{\to}a_{k}^{[M]}$ , our goal is to prove that for $k=j-1$ , we still have $a_{j-1}^{[M,N]}\overset{a.s.}{\to}a_{j-1}^{[M]}$ . By Lemma 1, it suffices to prove for fixed $i=1,2,\cdot\cdot\cdot,N$ , as $N\to\infty$ , $Z_{\tau_{j}^{[i,M,N]}}^{[i]}\overset{a.s.}{\to}Z_{\tau_{j}^{[i,M]}}^{[i]}.$

By definition, one can write

[TABLE]

By considering the following four cases:

(i)

If $Z_{j}^{[i]}\geq a_{j}^{[M,N]}\cdot L^{[M]}(X_{j}^{[i]})$ and $Z_{j}^{[i]}\geq a_{j}^{[i]}\geq a_{j}^{[M]}\cdot L^{[M]}(x_{j}^{[i]})$ , $|Z_{\tau_{j}^{[i,M,N]}}^{[i]}-Z_{\tau_{j}^{[i,M]}}^{[i]}|=0;$ 2. (ii)

If $Z_{j}^{[i]}<a_{j}^{[M,N]}\cdot L^{[M]}(X_{j}^{[i]})$ and $Z_{j}^{[i]}\geq a_{j}^{[i]}<a_{j}^{[M]}\cdot L^{[M]}(x_{j}^{[i]})$ , $|Z_{\tau_{j}^{[i,M,N]}}^{[i]}-Z_{\tau_{j}^{[i,M]}}^{[i]}|=|Z_{\tau_{j+1}^{[i,M,N]}}^{[i]}-Z_{\tau_{j+1}^{[i,M]}}^{[i]}|;$ 3. (iii)

If $a_{j}^{[M]}\cdot L^{[M]}(X_{j}^{[i]})\leq Z_{j}^{[i]}<a_{j}^{[M,N]}\cdot L^{[M]}(X_{j}^{[i]})$ , $|Z_{\tau_{j}^{[i,M,N]}}^{[i]}-Z_{\tau_{j}^{[i,M]}}^{[i]}|=|Z_{j}^{[i]}-Z_{\tau_{j+1}^{[i,M,N]}}^{[i]}|;$ 4. (iv)

If $a_{j}^{[M,N]}\cdot L^{[M]}(X_{j}^{[i]})\leq Z_{j}^{[i]}<a_{j}^{[M]}\cdot L^{[M]}(X_{j}^{[i]})$ , $|Z_{\tau_{j}^{[i,M,N]}}^{[i]}-Z_{\tau_{j}^{[i,M]}}^{[i]}|=|Z_{j}^{[i]}-Z_{\tau_{j+1}^{[i,M]}}^{[i]}|,$

we can write

[TABLE]

By Lemma 2 and $a_{j+1}^{[M,N]}\overset{a.s.}{\to}a_{j+1}^{[M]}$ , $I_{1}<\infty$ .

[TABLE]

Since $a_{j}^{[M,N]}\overset{a.s.}{\to}a_{j}^{[M]}$ , $\Pr\{Z_{j}=a_{j}^{[M]}\cdot L^{[M]}(X_{j})\}=0$ , we conclude that $Z_{\tau_{j}^{[i,M,N]}}^{[i]}\overset{a.s.}{\to}Z_{\tau_{j}^{[i,M]}}^{[i]}$ . This completes the induction. ∎

Lemma 4.

Consider a linear regression model: $Y=X^{\top}a+\epsilon$ . If we have $n$ observations, let $y=(y_{1},\ldots,y_{n})^{\top}$ , $x_{i}=(x_{1i},\ldots,x_{pi})^{\top}$ , $x=(x_{1},\ldots,x_{n})$ , $x^{(j)}=(x_{j1},x_{j2},\ldots,x_{jn})^{\top}$ , $a=(a_{1},\ldots,a_{p})^{\top}$ , $\varepsilon=(\varepsilon_{1},\ldots,\varepsilon_{n})^{\top}$ . We also define

[TABLE]

and denote the true parameters in the regression model by $a$ . Assume $\varepsilon_{1},\ldots,\varepsilon_{n}$ are i.i.d. with $E\varepsilon_{1}=0$ , $E|\varepsilon_{1}|<\infty$ , $y_{i}^{m}\overset{a.s.}{\to}y_{i}$ as $m\to\infty$ . If the compatibility condition holds for $S_{0}$ and $\lambda$ is a suitable penalty parameters satisfying $\lambda/n\to 0$ and $\lambda=\mathcal{O}(\log p/n)$ , then $\hat{a}_{n}^{m}\overset{a.s.}{\to}a$ as $n\to\infty$ and $m\to\infty$ .

Proof.

The proof is similar to that of Lemma 1. We adopt same notation used in Lemma 1 and omit some part of the proof. Again, observing that

[TABLE]

we can write

[TABLE]

Fix $\lambda_{0}\in(0,1)$ , $\varepsilon^{*}\in(0,1)$ . Since $\lambda/n=o(1)$ , there exists $n_{0}$ such that $\forall n\geq n_{0}$ , $\lambda/n\leq\lambda_{0}$ .

On the set $\mathscr{T}\cap\mathscr{T}_{2}$ , $\forall u\in\rm I\!R^{P}$ with $\|u_{S_{0}}\|>\frac{(6\lambda_{0}+4\varepsilon^{*})\sqrt{p}}{\phi_{0}^{2}/s_{0}}$ ,

[TABLE]

Since $f_{n}(0)=0$ , it follows that for $n\geq n_{0}$ , the minimum of $f_{n}(0)$ cannot be obtained in the set $\{u:\|u_{S_{0}}\|>\frac{(6\lambda_{0}+4\varepsilon^{*})\sqrt{p}}{\phi_{0}^{2}/s_{0}}\}$ , whenever $\mathscr{T}\cap\mathscr{T}_{2}$ holds. Hence, for $n\geq n_{0}$ , $\mathscr{T}\cap\mathscr{T}_{2}$ implies

[TABLE]

Due to the Compatibility Condition, we can write

[TABLE]

because $\|u_{S_{0}^{c}}\|_{1}\leq 3\|u_{S_{0}}\|_{1}$ implies $\|u_{S_{0}^{c}}\|\leq 9\|u_{S_{0}}\|$ . As a result,

[TABLE]

Since $\lambda_{0}$ and $\varepsilon^{*}\in(0,\infty)$ are arbitrary, this completes the proof. ∎

Proof of Theorem 1.

The proof of Theorem 1 (i) can be established based on preceding lemmas 1-4. It is equivalent to prove

[TABLE]

By the Law of large numbers (LLNs), it suffices to prove

[TABLE]

By Lemma 3.1 of Clement et al. (2002), we can write

[TABLE]

Since for $j=1,\ldots,L-1$ , $a_{j}^{[M,N]}\overset{a.s.}{\to}a_{j}^{[M]}$ . Then $\forall\varepsilon>0$ ,

[TABLE]

The last equality follows from LLN. Let $\varepsilon\to 0$ , we obtain the convergence to zero since for $j=1,\ldots,L-1$ , $\Pr\{a_{j}^{[M]}\cdot L^{[M]}(X_{j})=Z_{j}\}=0$ . The proof of Theorem 1 (ii) follows if we substitute Lemma 4 for Lemma 1 in the preceding proof. ∎

A.2 Proof of Theorem 2

To define the irrepresentable condition and relevant active set, we first re-write the gram matrix $A_{j}^{[M,N]}$ as $A_{j}$ , $c_{k,l}$ is the element in the $k$ -th row and $l$ -th column in the matrix $A_{j}$ . Define submatrices of the gram matrix $A_{j}$ given an index set $S$ as

[TABLE]

The Irrepresentable Condition and the relevant active set are defined as follows: We say that the Irrepresentable Condition is met for the set $S$ with cardinality $s$ , if for all vector $u_{S}\in\rm I\!R^{s}$ satisfying $\|u_{S}\|_{\infty}\leq 1$ , we have

[TABLE]

In addition, relevant active set $S_{0}^{\text{relevant}}$ is defined as for fixed $j\in\{0,...,T-1\}$ ,

[TABLE]

where $S_{0}$ is the active set, $a_{j,m}^{[M]}$ is the $m$ -th element of the true coefficient vector $a_{j}^{[M]}$ .

The following lemma is due to Theorem 7.1 of Bühlmann and van de Geer (2011).

Lemma 5.

Suppose the Irrepresentable Condition holds for $S_{0}$ . Then $S_{0}^{\text{relevant}}\subset S(\lambda)\subset S_{0}$ and for $j=0,...,L-1$ ,

[TABLE]

where $a_{j}^{[M,N]}$ is the LASSO estimated coefficients with penalty $\lambda$ , $S_{0}(\lambda)=\{k,a_{j,k}^{[M,N]}\neq 0\}$ .

Proof of Theorem 2.

Our proof skips some steps that are similar to the proof of Theorem 3.1 in Clement et al. (2002). It is equivalent to prove for $j=0,\ldots,L$ ,

[TABLE]

Note that the following induction holds for both $M_{1}$ and $M$ until specification. For $j=L$ , $\tau_{T}^{[M,N]}=\tau_{T}=T$ and $\mathbf{E}(Z_{\tau_{j}^{[M,N]}}|\mathscr{F}_{j})=\mathbf{E}(Z_{\tau_{j}}|\mathscr{F}_{j})$ . Assume $\lim_{N\to\infty}\mathbf{E}(Z_{\tau_{k}^{[M,N]}}|\mathscr{F}_{k})=\mathbf{E}(Z_{\tau_{k}}|\mathscr{F}_{k})$ holds for $k=j+1$ , we want to prove it also holds for $k=j$ .

[TABLE]

and

[TABLE]

The second term in the RHS converges to zero by induction. Next, observe that

[TABLE]

By definition of the projection $P_{j}(\cdot)$ ,

[TABLE]

Therefore, one can write

[TABLE]

As $N\to\infty$ , the first term in the R.H.S. converges to zero by Theorem 7. The second term is zero by Theorem 7 since these $M_{1}$ basis functions span $L^{2}\{\sigma(X_{j})\}$ .

[TABLE]

As $N\to\infty$ , the first term in the R.H.S. converges to zero since Theorem 7 is applicable to any fixed $M$ . The second term is zero by Theorem 7 since these $M_{1}$ basis functions span $L^{2}(\sigma(X_{j}))$ . To prove the convergence for the second term, it suffices to prove

[TABLE]

(i)

To prove $U_{j}^{*[M_{1},N]}\overset{a.s}{\to}U_{j}$ , it remains to prove as $N\to\infty$ ,

[TABLE] 2. (ii)

To prove $U_{j}^{[M,N]}\overset{a.s}{\to}U_{j}$ , it remains to prove as $N\to\infty$ ,

[TABLE]

By Condition (A1),

[TABLE]

For (i), $P_{j}^{[M_{1}]}(\mathbf{E}(Z_{\tau_{j+1}}|F_{j}))=(a_{j})_{S_{0}^{[M_{1}]}}\cdot\big{(}L(X_{j})\big{)}_{S_{0}^{[M_{1}]}}$ . Recall that $S_{0}\subseteq S_{0}^{[M_{1}]}$ . For $k\in S_{0}\subseteq S_{0}^{[M_{1}]}$ , $a_{j,k}^{[M_{1}]}=a_{j,k}\neq 0$ . For $k\in S_{0}^{\mathsf{c}}\setminus(S_{0}^{[M_{1}]})^{\mathsf{c}}$ , $a_{j,k}^{[M_{1}]}=a_{j,k}\neq 0$ . It follows that $(a_{j})_{S_{0}}\cdot\big{(}L(X_{j})\big{)}_{S_{0}}=(a_{j})_{S_{0}^{[M_{1}]}}\cdot\big{(}L(X_{j})\big{)}_{S_{0}^{[M_{1}]}}$ and $\big{|}P_{j}^{[M_{1}]}(\mathbf{E}(Z_{\tau_{j+1}}|F_{j}))-\mathbf{E}(Z_{\tau_{j+1}}|F_{j})\big{|}0$ .

For (ii), $P_{j}^{[M]}(\mathbf{E}(Z_{\tau_{j+1}}|F_{j}))=(a_{j})_{S_{0}(\lambda)}\cdot\big{(}L(X_{j})\big{)}_{S_{0}(\lambda)}$ . There are $M$ basis functions selected from the initial regression with $M_{1}$ basis functions by LASSO with penalty $\lambda$ where $M\leq M_{1}$ . Define

[TABLE]

Then by Lemma 5, $S_{0}^{\text{relevant}}\subseteq S_{0}(\lambda)\subseteq S_{0}\subseteq S_{0}^{[M_{1}]}$ . For $k\in S_{0}(\lambda)\subseteq S_{0}$ , $a_{j,k}^{[M]}=a_{j,k}\neq 0$ . For $k\in S_{0}\setminus(S_{0}(\lambda))$ , $a_{j,k}^{[M]}=0$ , $a_{j,k}\neq 0$ ,where $S_{0}\setminus(S_{0}(\lambda))\subseteq S_{0}\setminus S_{0}^{\text{relevant}}=\{k:0<|a_{j,k}^{[M_{1}]}|<\lambda^{(j)}\sup_{\|u_{S_{0}}\|_{\infty}\leq 1}\|\Sigma_{1,1}^{(j)-1}(S_{0})u_{S_{0}}\|_{\infty}/2\}$ .

It follows that

[TABLE]

Since $\lambda^{(j)}\sup_{\|u_{S_{0}}\|_{\infty}\leq 1}\|\Sigma_{1,1}^{(j)-1}(S_{0})u_{S_{0}}\|_{\infty}/2\to 0$ as $N\to\infty$ . The remaining term $|\sum_{k\in S_{0}\setminus S_{0}(\lambda)}L_{k}(X_{j})|<\sum_{k\in S_{0}\setminus S_{0}(\lambda)}|L_{k}(X_{j})|<\infty$ since $|S_{0}|=s_{0}<\infty$ , $|X_{j}|<\infty$ , $|L_{k}(X_{j})|<\infty$ for all $k\in S_{0}$ . ∎

A.3 Proof of Theorem 3

Proof of Theorem 3.

We begin the proof by rewriting $\text{VaR}_{j}^{[M,N]}$ , $\text{VaR}_{j}^{[M]}$ as

[TABLE]

where $\alpha^{\prime}$ is a deterministic known constant. By Theorem 1, $U_{t_{1}}^{[M,N]}\overset{a.s.}{\to}U_{t_{1}}^{[M]}$ as $N\to\infty$ . Denote the pdf of $U_{t_{1}}^{[M,N]}$ and $U_{t_{1}}^{[M]}$ as $g_{N}(u)$ and $g(u)$ respectively, then

[TABLE]

where $G_{N}(u)$ , $G(u)$ is the cdf of $U_{t_{1}}^{[M,N]}$ , $U_{t_{1}}^{[M]}$ . As $U_{t_{1}}^{[M,N]}\overset{a.s.}{\to}U_{t_{1}}^{[M]}$ , we have $U_{t_{1}}^{[M,N]}\overset{d}{\to}U_{t_{1}}^{[M]}$ , $G_{N}(\text{VaR}_{j}^{[M]})\to G(\text{VaR}_{j}^{[M]})$ , $|\int_{\text{VaR}_{j}^{[M]}}^{\text{VaR}_{j}^{[M,N]}}g_{N}(u)du|\to 0$ . We complete the proof by contradiction.

Assume $\text{VaR}_{j}^{[M,N]}\nrightarrow\text{VaR}_{j}^{[M]}$ , then $\forall N\in N_{+}$ , $\exists\epsilon_{0}>0$ , st $|\text{VaR}_{j}^{[M,N]}-\text{VaR}_{j}^{[M]}|>\epsilon_{0}$ . As the support set of the distribution of $\text{VaR}_{j}^{[M,N]}$ is tight, there exists $u_{0}\in[\min(\text{VaR}_{j}^{[M,N]},\text{VaR}_{j}^{[M]}),\max(\text{VaR}_{j}^{[M,N]},\text{VaR}_{j}^{[M]})]$ such that $g_{N}(u_{0})>0$ .

If $U_{t_{1}}^{[M,N]}$ is discrete,

[TABLE]

contradiction.

If $U_{t_{1}}^{[M,N]}$ is continuous, $\exists\epsilon_{0}^{*}>0$ , $\forall u\in(u_{0}-\epsilon_{0}^{*},u_{0}+\epsilon_{0}^{*})~{}\cap~{}\max(\text{VaR}_{j}^{[M,N]},\text{VaR}_{j}^{[M]})]$ , $g_{N}(u)>u_{0}^{*}>0$ ,

[TABLE]

contradiction. Therefore, the assumption $\text{VaR}_{j}^{[M,N]}\nrightarrow\text{VaR}_{j}^{[M]}$ is not true in which case $\text{VaR}_{j}^{[M,N]}\to\text{VaR}_{j}^{[M]}$ as $N\to\infty$ . ∎

A.4 Proof of Theorem 4

To prove this theorem, we first introduce the following lemma and its proof.

Lemma 6.

Let $\alpha_{N}\buildrel\triangle\over{=}\Pr\{U_{0}-U_{t_{1}}^{[M,N]}<-\text{VaR}_{t_{1}}^{[M]}\}$ , $\alpha_{N}^{*}\buildrel\triangle\over{=}\Pr\{U_{0}-U_{t_{1}}^{*[M,N]}<-\text{VaR}_{t_{1}}^{[M]}\}$ . Assume conditions in Theorem 1(ii) are satisfied and Condition (A6) holds for $W$ and $W^{*}$ respectively, then

[TABLE]

*where $\phi_{0}$ denotes the compatibility constant defined in the Compatibility Condition.

Proof of Lemma 6.

Using Taylor expansion, we can write

[TABLE]

The first term can be written as,

[TABLE]

The last equality follows from Theorem 7.7 in Bühlmann and van de Geer (2011) and Theorem 1. Regarding the second term, we can write,

[TABLE]

It follows that

[TABLE]

Likewise, we have

[TABLE]

∎

Proof of Theorem 4.

By Condition (A5), $U_{t_{1}}^{[M]}$ is continuous. Therefore,

[TABLE]

Similar to the proof of (28) in Gordy and Juneja (2010), we apply Taylor expansion to $\Pr\{U_{t_{1}}^{[M,N]}>v_{1}\}$ in the following equation,

[TABLE]

where ${v}$ is an appropriate value between $v$ and $v_{1}$ .

By Condition (A5), $g_{N}^{\prime}(u)$ is uniformly bounded for all $v$ . By Theorem 6,

[TABLE]

Therefore, we have

[TABLE]

To derive the relation between $g_{N}(v_{0})$ and $g(v)$ , we observe that

[TABLE]

where $\tilde{v}$ lies between $v-w/\sqrt{N}$ and $v$ .

[TABLE]

Likewise, we can prove that

[TABLE]

If $N=o\left(\frac{M^{2}\phi_{0}^{2}}{s_{0}\log M}+2M+\frac{s_{0}\log M}{\phi_{0}^{2}}\right)$ ,

[TABLE]

The desired result thus follows. ∎

Appendix B: Details of Numerical Studies

This section contains the details for the numerical studies discussed in Section 3 including the data, the underlying models and their calibrated parameters.

B1. Settings for Rainbow Options in Section 3.1

To derive a benchmark utilizing existing closed form solution for pricing, we assume the underlying stock prices follow Black-Scholes Model, where the risk-free rate $r$ , the volatility of each underlying stocks and the correlation between different underlying stocks remain constant from $T_{0}$ to $T$ . Define $\rho_{ij}$ as the correlation between the $i$ th and $j$ th underlying stock and $\sigma_{ij}=\sigma_{i}^{2}+\sigma_{j}^{2}-2\rho_{ij}\sigma_{i}\sigma_{j}$ as the covariance. Define

[TABLE]

where $i,j,k=1,...,10$ . Similar to the closed form solution of the “call on min” rainbow option given in Johnson (1987), the option price at any given time $t\in[0,T]$ can be written as

[TABLE]

where $n=10$ , $N_{n}(\cdot)$ is the cumulative distribution function of the $n$ -dimensional standard normal distribution.

The option price at $T_{0}$ can thus be reduced to

[TABLE]

Parameters in the dynamics of the underlying stocks include the risk-free rate $r$ , the volatility $\sigma_{i}$ , the drift $\mu_{i}$ , the current price $S_{i0}$ and the correlation between different stocks $\rho_{ij}$ , where $i,j=1,...,10$ . They are reasonably chosen based on the observation of commonly traded stocks in the market. $500$ daily historical underlying stock prices are simulated assuming Black-Scholes as the underlying model. We set the volatility $\sigma_{1}$ , $\sigma_{2}$ and the correlation between $S_{1}$ , $S_{2}$ relatively large so that $S_{1}$ and $S_{2}$ represent significant variables in the regression.

The starting historical price $S_{i,-1}$ , the daily drift $\mu_{i}$ and the volatility $\sigma_{i}$ are shown in Table 6 while the correlation matrix is presented in Table 7. The numerical results can be found in Section 3.1.

B2. Settings for Rainbow Swptions in Sections 3.2 and 3.3

Our formulation follows Brigo and Mercurio (2007) [Section 6.3.1] that assumes lognormal distribution of forward rates. The dynamics of forward rates $L_{i}(t)$ under $\mathbb{Q}^{i}$ are, respectively,

[TABLE]

where $Z$ is a Brownian motion under measure $\mathbb{Q}^{i}$ , $Z_{i}$ , $Z_{j}$ are Brownian motions of different forward rates $L_{i}(t)$ whose instantaneous correlation with $L_{j}(t)$ is $\rho=(\rho_{ij})_{i,j=1,2,...}$ . The measure associated with zero-coupon bonds maturing at time $T_{i}$ is denoted by $\mathbb{Q}^{i}$ . Note that all equations in equation (B2. Settings for Rainbow Swptions in Sections 3.2 and 3.3) admit a unique strong solution if $\sigma_{j}(\cdot)$ are bounded.

In order to fully specify the forward rates dynamics in the LFM, instantaneous volatilities and correlation function have to be determined. A time-homogenous function to parameterize instantaneous volatilities and correlation is widely adopted. The term “time-homogenous” here indicates that the function is time-dependent, and the time dependency is tied to the time left to reach maturity of the underlying swap. In our example, we apply one of the most commonly used parametric forms, namely

[TABLE]

where $\gamma=(\gamma_{1},\gamma_{2},\gamma_{3},\gamma_{4})$ is a parameter set, $\psi_{i}$ is a correction parameter that fits the volatilities more closely to market data. This function has a “humped” shape which can be interpreted descriptively with economic knowledge.

For instantaneous correlation $\rho$ , its parameterized form suggested in Joshi (2003) and Rebonato (2002) is given by

[TABLE]

To calibrate parameters in instantaneous volatility and correlation, we take the market data as input

[TABLE]

of initial annual forward rates and the annual ATM caplet volatility

[TABLE]

where $\sigma^{\text{caplet}}_{i}$ stands for the volatility of annual caplet resetting at $i$ -th year and paying at $(i+1)$ -th year. $i=1,2,\cdots,20$ .

A recursive calibration algorithm starts by initializing $\gamma_{0}$ , $\beta_{0}$ by appropriate guess. With $\gamma_{0}$ , $\beta_{0}$ , we can estimate $\widetilde{\psi}_{i}$ for $i=1,...,20$ so as to match the market volatility of the co-terminal caplets by,

[TABLE]

Given those $\widetilde{\psi}_{i}$ ’s, re-estimate $\gamma$ , $\beta$ by

[TABLE]

where $\sigma_{i}$ are Black volatility for $i$ NC $\alpha$ swaptions, $\hat{\sigma}_{i}(\beta,\gamma;\widetilde{\psi})$ is the model volatility adopted in Rebonato (2002). The corresponding formula approximates the lognormal forward LIBOR model swaption volatility by

[TABLE]

where $w_{j}(0)=\frac{\delta_{j}L(0,T_{2},T_{j})}{\sum_{k=3}^{i}\delta_{k}L(0,T_{2},T_{k})}$ and $S_{2,i}(0)$ is the ATM swap rate for $i$ NC $2$ swaptions. Substitute functional forms in formula (B2. Settings for Rainbow Swptions in Sections 3.2 and 3.3) and (9) for instantaneous volatility and correlation, $\hat{\sigma}_{i}^{2}(0)$ can be expressed as a function of parameter $\gamma,\beta,\psi$ .

Re-estimating $\gamma,\beta$ can be achieved by solving the minimization problem in formula (10) after which, re-estimate $\psi$ iteratively is carried out. The iteration procedure stops when either convergence or the maximum number of iteration is reached.

We put a constraint on the calibration of $\psi$ such that $1-0.1\leq\psi_{i}\leq 1+0.1$ for all $i$ . This constraint requires all $\psi_{i}$ to be close to one so that the term structure’s qualitative behavior could be captured in time. The functional form of instantaneous volatility and correlation are constructed to produce a smooth shape for the term structure of volatility at all instants, since the typical erratic behavior of piecewise-constant assumption can be improved by linear/exponential functions. Numerical results are shown in Section 3.2.

Bibliography33

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Artzner et al. (1999) Artzner, P., Delbaen, F., Eber, J., and Heath, D. (1999), “Coherent measures of risk,” Mathematical Finance , 9, 203–28.
2Bauer et al. (2012) Bauer, D., Reuss, A., and Singer, D. (2012), “On the calculation of the solvency capital requirement based on nested simulations,” Astin Bulletin , 42, 453–499.
3Bickel et al. (2009) Bickel, P., Ritov, Y., and Tsybakov, A. (2009), “Simultaneous analysis of Lasso and Dantzig selector,” The Annals of Statistics , 37, 1075–32.
4BIS (2013) BIS (2013), “Basel Committee on Banking Supervision, Revisions to the Basel II market risk framework,” BSBS , 158.
5Black and Scholes (1973) Black, F. and Scholes, M. (1973), “The pricing of options and corporate liabilities,” Journal of Political Economy , 81, 637–59.
6Brigo and Mercurio (2007) Brigo, D. and Mercurio, F. (2007), Interest Rate Models, Theory and Practice , Springer.
7Broadie et al. (2011) Broadie, M., Du, Y., and Moallemi, C. C. (2011), “Efficient risk estimation via nested sequential simulation,” Management Science , 57, 1172–94.
8Bühlmann and van de Geer (2011) Bühlmann, P. and van de Geer, S. (2011), Statistics for High-Dimensional Data: Methods, Theory and Applications , Springer: New York.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Simulation-based Value-at-Risk for Nonlinear Portfolios

Abstract

Introduction

Summary of Contributions

Organization of the Paper

Methodology

Preliminaries and Notation

Convergence Results for Valuation

Theorem 1**.**

Proof.

Remark 1**.**

Remark 2**.**

Theorem 2**.**

Proof.

Convergence Results for VaR

Theorem 3**.**

Proof.

Remark 3**.**

Theorem 4**.**

Proof.

Remark 4**.**

Numerical Studies

Rainbow Option

European Swaption

Bermudan swaption

Conclusion

Acknowledgement

Appendix A: Proofs of the convergence results

A.1 Proof of Theorem 1

Lemma 1**.**

Proof.

Lemma 2**.**

Proof.

Lemma 3**.**

Proof.

Lemma 4**.**

Proof.

Proof of Theorem 1.

A.2 Proof of Theorem 2

Lemma 5**.**

Proof of Theorem 2.

A.3 Proof of Theorem 3

Proof of Theorem 3.

A.4 Proof of Theorem 4

Lemma 6**.**

Proof of Lemma 6.

Proof of Theorem 4.

Appendix B: Details of Numerical Studies

B1. Settings for Rainbow Options in Section 3.1

B2. Settings for Rainbow Swptions in Sections 3.2 and 3.3

Theorem 1.

Remark 1.

Remark 2.

Theorem 2.

Theorem 3.

Remark 3.

Theorem 4.

Remark 4.

Lemma 1.

Lemma 2.

Lemma 3.

Lemma 4.

Lemma 5.

Lemma 6.