Estimation of Risk Contributions with MCMC

Takaaki Koike; Mihoko Minami

arXiv:1702.03098·q-fin.RM·January 18, 2019

Estimation of Risk Contributions with MCMC

Takaaki Koike, Mihoko Minami

PDF

Open Access

TL;DR

This paper introduces a Markov Chain Monte Carlo (MCMC) based estimator for risk contributions in financial portfolios, improving efficiency and accuracy over traditional methods, especially in high-dimensional, rare-event scenarios.

Contribution

The paper presents a novel MH algorithm-based estimator for VaR contributions that is more sample-efficient and accurate than existing estimators, applicable to complex, high-dimensional risk models.

Findings

01

MH estimator has smaller bias and mean squared error than existing methods.

02

The method is consistent and asymptotically normal.

03

Effective in high-dimensional, inhomogeneous risk models.

Abstract

Determining risk contributions of unit exposures to portfolio-wide economic capital is an important task in financial risk management. Computing risk contributions involves difficulties caused by rare-event simulations. In this study, we address the problem of estimating risk contributions when the total risk is measured by value-at-risk (VaR). Our proposed estimator of VaR contributions is based on the Metropolis-Hasting (MH) algorithm, which is one of the most prevalent Markov chain Monte Carlo (MCMC) methods. Unlike existing estimators, our MH-based estimator consists of samples from conditional loss distribution given a rare event of interest. This feature enhances sample efficiency compared with the crude Monte Carlo method. Moreover, our method has the consistency and asymptotic normality, and is widely applicable to various risk models having joint loss density. Our numerical…

Tables1

Table 1. Table 1: Estimates (biases) and standard errors (rooted mean squared errors; RMSEs) of the four different estimators of value-at-risk contributions under four different risk models † † \dagger .

(1)	Pareto + survival Clayton: True AC = (10.708, 10.708, 10.708)
		Estimate of AC (Bias):				Standard error ( $\sqrt{M S E}$ ):
	Estimator	$\bm M C$	$\bm N W$	$\bm G R$	$\bm M H$	$\bm M C$	$\bm G R$	$\bm M H$
	AC₁	10.575	11.744	10.745	10.708	0.173	0.008	0.019
		(-0.133)	(1.036)	(0.037)	(0.000)	(0.218)	(0.038)	(0.019)
	AC₂	10.138	10.547	10.635	10.724	0.169	0.008	0.020
		(-0.571)	(-0.161)	(-0.074)	(0.016)	(0.595)	(0.074)	(0.025)
	AC₃	10.389	9.813	10.745	10.693	0.178	0.008	0.018
		(-0.320)	(-0.896)	(0.037)	(-0.016)	(0.366)	(0.038)	(0.024)
(2)	Pareto + $t$ -copula:
	AC₁	6.835	8.162	7.697	7.339	0.238	0.010	0.041
		(-0.362)	(0.964)	(0.499)	(-0.121)	(0.433)	(0.499)	(0.132)
	AC₂	8.785	8.355	8.740	8.765	0.223	0.010	0.028
		(-0.122)	(-0.553)	(-0.167)	(-0.023)	(0.255)	(0.168)	(0.046)
	AC₃	11.913	11.781	11.875	12.208	0.134	0.006	0.024
		(-0.293)	(-0.426)	(-0.332)	(0.144)	(0.322)	(0.332)	(0.148)
(3)	Student’s $t$ + survival Clayton: True AC = (5.647, 5.647, 5.647)
	AC₁	5.592	5.693	5.662	5.617	0.081	0.006	0.018
		(-0.055)	(0.046)	(0.015)	(-0.029)	(0.098)	(0.016)	(0.034)
	AC₂	5.410	5.722	5.642	5.665	0.079	0.006	0.019
		(-0.236)	(0.076)	(-0.005)	(0.018)	(0.249)	(0.007)	(0.026)
	AC₃	5.473	5.517	5.636	5.658	0.082	0.006	0.018
		(-0.173)	(-0.130)	(-0.011)	(0.011)	(0.192)	(0.012)	(0.021)
(4)	Student’s $t$ + $t$ -copula: True AC = (2.996, 3.745, 6.741)
	AC₁	2.821	3.065	2.997	2.940	0.117	0.007	0.036
		(-0.176)	(0.069)	(0.001)	(-0.056)	(0.211)	(0.007)	(0.067)
	AC₂	3.772	3.560	3.742	3.792	0.109	0.006	0.033
		(0.027)	(-0.185)	(-0.004)	(0.047)	(0.112)	(0.007)	(0.057)
	AC₃	6.564	6.852	6.745	6.751	0.043	0.002	0.011
		(-0.178)	(0.110)	(0.003)	(0.010)	(0.183)	(0.004)	(0.015)

Equations155

S = j = 1 \sum d X_{j},

S = j = 1 \sum d X_{j},

F_{X} (x) = C (F_{1} (x_{1}), \dots, F_{d} (x_{d})), x = (x_{1}, x_{2}, \dots, x_{d}) \in R^{d},

F_{X} (x) = C (F_{1} (x_{1}), \dots, F_{d} (x_{d})), x = (x_{1}, x_{2}, \dots, x_{d}) \in R^{d},

f_{X} (x) = c (F_{1} (x_{1}), \dots, F_{d} (x_{d})) f_{1} (x_{1}) \dots f_{d} (x_{d}), x \in R^{d},

f_{X} (x) = c (F_{1} (x_{1}), \dots, F_{d} (x_{d})) f_{1} (x_{1}) \dots f_{d} (x_{d}), x \in R^{d},

ϱ (S) = j = 1 \sum d AC_{j} .

ϱ (S) = j = 1 \sum d AC_{j} .

ϱ (u^{T} X) = j = 1 \sum d u_{j} \frac{\partial ϱ ( u ^{T} X )}{\partial u _{j}}, u \in Λ,

ϱ (u^{T} X) = j = 1 \sum d u_{j} \frac{\partial ϱ ( u ^{T} X )}{\partial u _{j}}, u \in Λ,

AC_{j}^{ϱ} := \frac{\partial ϱ ( u ^{T} X )}{\partial u _{j}}_{u = 1_{d}}, j = 1, 2, \dots, d,

AC_{j}^{ϱ} := \frac{\partial ϱ ( u ^{T} X )}{\partial u _{j}}_{u = 1_{d}}, j = 1, 2, \dots, d,

AC_{j}^{VaR_{p}} := \frac{\partial VaR _{p} ( u ^{T} X )}{\partial u _{j}}_{u = 1_{d}} = E [X_{j} ∣ X_{1} + \dots + X_{d} = VaR_{p} (S)] .

AC_{j}^{VaR_{p}} := \frac{\partial VaR _{p} ( u ^{T} X )}{\partial u _{j}}_{u = 1_{d}} = E [X_{j} ∣ X_{1} + \dots + X_{d} = VaR_{p} (S)] .

AC_{j}^{ES_{p}} := \frac{\partial ES _{p} ( u ^{T} X )}{\partial u _{j}}_{u = 1_{d}} = E [X_{j} ∣ X_{1} + \dots + X_{d} \geq VaR_{p} (S)]

AC_{j}^{ES_{p}} := \frac{\partial ES _{p} ( u ^{T} X )}{\partial u _{j}}_{u = 1_{d}} = E [X_{j} ∣ X_{1} + \dots + X_{d} \geq VaR_{p} (S)]

AC_{δ} = E [X ∣ S \in [VaR_{p} (S) - δ, VaR_{p} (S) + δ]],

AC_{δ} = E [X ∣ S \in [VaR_{p} (S) - δ, VaR_{p} (S) + δ]],

AC_{δ} = \frac{E [ X 1 _{[S \in A_{δ}]} ]}{P ( S \in A _{δ} )}, where A_{δ} = [VaR_{p} (S) - δ, VaR_{p} (S) + δ] .

AC_{δ} = \frac{E [ X 1 _{[S \in A_{δ}]} ]}{P ( S \in A _{δ} )}, where A_{δ} = [VaR_{p} (S) - δ, VaR_{p} (S) + δ] .

AC_{δ, N}^{MC} = \frac{\sum _{n = 1}^{N} X ^{(n)} 1 _{[S^{(n)} \in A_{δ}]}}{\sum _{n = 1}^{N} 1 _{[S^{(n)} \in A_{δ}]}} = \frac{1}{M _{δ, N}} n = 1 \sum N X^{(n)} 1_{[S^{(n)} \in A_{δ}]},

AC_{δ, N}^{MC} = \frac{\sum _{n = 1}^{N} X ^{(n)} 1 _{[S^{(n)} \in A_{δ}]}}{\sum _{n = 1}^{N} 1 _{[S^{(n)} \in A_{δ}]}} = \frac{1}{M _{δ, N}} n = 1 \sum N X^{(n)} 1_{[S^{(n)} \in A_{δ}]},

AC_{δ, N}^{MC} - AC = b_{δ} (N) + b (δ),

AC_{δ, N}^{MC} - AC = b_{δ} (N) + b (δ),

AC_{ϕ, h, N}^{NW} = \frac{\sum _{n = 1}^{N} X ^{(n)} ϕ ( \frac{S ^{(n)} - VaR _{p} ( S )}{Δ} )}{\sum _{n = 1}^{N} ϕ ( \frac{S ^{(n)} - VaR _{p} ( S )}{Δ} )},

AC_{ϕ, h, N}^{NW} = \frac{\sum _{n = 1}^{N} X ^{(n)} ϕ ( \frac{S ^{(n)} - VaR _{p} ( S )}{Δ} )}{\sum _{n = 1}^{N} ϕ ( \frac{S ^{(n)} - VaR _{p} ( S )}{Δ} )},

X = g_{β} (S) + ε,

X = g_{β} (S) + ε,

\widehat{\text{AC}}^{\text{\scriptsize GR}}_{g_{\bm{\beta}},N}:=g_{\hat{\bm}{\beta}_{N}}(\text{VaR}_{p}(S)).

\widehat{\text{AC}}^{\text{\scriptsize GR}}_{g_{\bm{\beta}},N}:=g_{\hat{\bm}{\beta}_{N}}(\text{VaR}_{p}(S)).

E [X ∣ S = VaR_{p} (S)] = E [X] + \frac{Cov ( X , S )}{Var ( S )} (VaR_{p} (S) - E [S]);

E [X ∣ S = VaR_{p} (S)] = E [X] + \frac{Cov ( X , S )}{Var ( S )} (VaR_{p} (S) - E [S]);

β_{0} = E [X] - \frac{Cov ( X , S )}{Var ( S )} E [S] and β_{1} = \frac{Cov ( X , S )}{Var ( S )} .

β_{0} = E [X] - \frac{Cov ( X , S )}{Var ( S )} E [S] and β_{1} = \frac{Cov ( X , S )}{Var ( S )} .

P (X^{(n + 1)} \in A ∣ X^{(k)} = x^{(k)}, k \leq n) = P (X^{(n + 1)} \in A ∣ X^{(n)} = x^{(n)}),

P (X^{(n + 1)} \in A ∣ X^{(k)} = x^{(k)}, k \leq n) = P (X^{(n + 1)} \in A ∣ X^{(n)} = x^{(n)}),

π (h) := \int_{E} h (x) π (d x) .

π (h) := \int_{E} h (x) π (d x) .

\hat{π}_{N} (h) := \frac{1}{N} n = 1 \sum N h (X^{(n)}),

\hat{π}_{N} (h) := \frac{1}{N} n = 1 \sum N h (X^{(n)}),

K (x, d y) = k (x, y) d y + r (x) δ_{x} (y),

K (x, d y) = k (x, y) d y + r (x) δ_{x} (y),

k (x, y)

k (x, y)

α (x, y)

r (x)

α_{n} := α (X^{(n)}, X_{*}^{(n)}) = min [\frac{π ( X _{*}^{(n)} ) q ( X _{*}^{(n)} , X ^{(n)} )}{π ( X ^{(n)} ) q ( X ^{(n)} , X _{*}^{(n)} )}, 1] .

α_{n} := α (X^{(n)}, X_{*}^{(n)}) = min [\frac{π ( X _{*}^{(n)} ) q ( X _{*}^{(n)} , X ^{(n)} )}{π ( X ^{(n)} ) q ( X ^{(n)} , X _{*}^{(n)} )}, 1] .

X^{(n + 1)} := 1_{[U \leq α_{n}]} X_{*}^{(n)} + 1_{[U > α_{n}]} X^{(n)} .

X^{(n + 1)} := 1_{[U \leq α_{n}]} X_{*}^{(n)} + 1_{[U > α_{n}]} X^{(n)} .

N \to \infty lim \hat{π}_{N} (h) = π (h) a.s.,

N \to \infty lim \hat{π}_{N} (h) = π (h) a.s.,

N {\hat{π}_{N} (h) - π (h)} ⟶ d N_{d} (0, Σ_{h}) as N \to \infty,

N {\hat{π}_{N} (h) - π (h)} ⟶ d N_{d} (0, Σ_{h}) as N \to \infty,

Σ_{h} := Var_{π} [h (X^{(1)})] + 2 k = 1 \sum \infty Cov_{π} [h (X^{(1)}), h (X^{(k + 1)})] .

Σ_{h} := Var_{π} [h (X^{(1)})] + 2 k = 1 \sum \infty Cov_{π} [h (X^{(1)}), h (X^{(k + 1)})] .

\hat{Σ}_{h, N} = \frac{L _{N}}{B _{N} - 1} b = 1 \sum B_{N} {\hat{π}_{N, b} (h) - \hat{π}_{N} (h)} {\hat{π}_{N, b} (h) - \hat{π}_{N} (h)}^{T},

\hat{Σ}_{h, N} = \frac{L _{N}}{B _{N} - 1} b = 1 \sum B_{N} {\hat{π}_{N, b} (h) - \hat{π}_{N} (h)} {\hat{π}_{N, b} (h) - \hat{π}_{N} (h)}^{T},

\hat{π}_{N, b} (h) = \frac{1}{L _{N}} l = (b - 1) L_{N} \sum b L_{N} - 1 h (X^{(l)}) for b = 1, 2, \dots, B_{N} .

\hat{π}_{N, b} (h) = \frac{1}{L _{N}} l = (b - 1) L_{N} \sum b L_{N} - 1 h (X^{(l)}) for b = 1, 2, \dots, B_{N} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMarkov Chains and Monte Carlo Methods · Bayesian Methods and Mixture Models · Statistical Methods and Inference

Full text

Estimation of risk contributions with MCMC

TAKAAKI KOIKE*∗* ${\dagger}$ and MIHOKO MINAMI ${{\ddagger}}$ ∗Corresponding author. Email: [email protected] ${\dagger}$ Department of Statistics and Actuarial Science, University of Waterloo, 200 University Avenue West, Waterloo, ON, N2L 3G1, Canada.

${\ddagger}$ Department of Mathematics, Keio University, Hiyoshi, Kohoku-ku, Yokohama, Kanagawa, 3-14-1, Japan.

Abstract

Determining risk contributions of unit exposures to portfolio-wide economic capital is an important task in financial risk management. Computing risk contributions involves difficulties caused by rare-event simulations. In this study, we address the problem of estimating risk contributions when the total risk is measured by value-at-risk (VaR). Our proposed estimator of VaR contributions is based on the Metropolis-Hasting (MH) algorithm, which is one of the most prevalent Markov chain Monte Carlo (MCMC) methods. Unlike existing estimators, our MH-based estimator consists of samples from conditional loss distribution given a rare event of interest. This feature enhances sample efficiency compared with the crude Monte Carlo method. Moreover, our method has the consistency and asymptotic normality, and is widely applicable to various risk models having joint loss density. Our numerical experiments based on simulation and real-world data demonstrate that in various risk models, even those having high-dimensional ( $\approx 500$ ) inhomogeneous margins, our MH estimator has smaller bias and mean squared error compared with existing estimators.

keywords:

Value-at-risk; Risk allocation; Risk contributions; VaR contributions; Copulas; Markov chain Monte Carlo; Metropolis-Hastings algorithm

{classcode}

C58, C63

1 Introduction

In most financial institutions, the risk of their portfolios is measured by economic capital. Capital allocation is an important risk analysis, where the economic capital is decomposed into a sum of risk contributions of unit exposures; see, for example, Dev (2004). The Euler principle, proposed in Tasche (1995), is one of the most well-known rules of risk allocation. It is economically justified, for example, in Denault (2001) and Tasche (1995, 2008)

On the other hand, calculating risk contributions poses theoretical and numerical difficulties, especially when the portfolio-wide risk is measured by value-at-risk (VaR). Although a simple formula of VaR contributions is derived by Tasche (2001), it can rarely be calculated analytically without a few exceptions, for example, in Tasche (2004). As is seen in Fan et al. (2012) and Yamai and Yoshiba (2002), the crude Monte Carlo (MC) method is the simplest method of computing risk contributions. However, the MC estimator suffers from unignorable bias caused by sample inefficiency and by inevitable numerical modification; see, for instance, Yamai and Yoshiba (2002). To overcome such difficulties, several methods have been proposed in the literature. For instance, Hallerbach (2003) and Tasche and Tibiletti (2004) derived approximation formulas by regarding VaR contributions as the best predictor of individual losses given total loss. In this paper, we call this estimator the generalized regression (GR) estimator. Glasserman (2005) developed importance sampling (IS) estimators with their main focus on credit portfolios.

Finally, Tasche (2009) proposed the Nadaraya-Watson (NW) estimator, which is based on the kernel estimation method. Despite its ease of calculation, it still requires importance sampling to achieve an efficient estimation.

In this paper, we propose a new method of estimating VaR contributions that utilizes the Markov chain Monte Carlo (MCMC), especially the Metropolis-Hastings (MH) algorithm (Metropolis et al., 1953; Hastings, 1970). Our MH method requires joint loss density which can be evaluated at each point. This is often the case when losses are modelled separately by marginal distributions and a copula; see Yoshiba (2013) for various examples. . To the best of our knowledge, no stable estimator of VaR contributions is known for general risk models. We study the consistency and asymptotic normality of our MH estimator, and provide practical guidelines for the efficient application of the MCMC method to the problem of computing VaR contributions. The proposed method is then carried out for various risk models based on simulations and real-world data. In numerical experiments, we compare the performance of the MH estimator with other existing estimators.

The foremost difference between our MH method and the crude MC is that in the former, samples are generated directly from the joint loss distribution given a rare event of interest. In contrast, the MC method generates samples from the unconditional loss distribution, which makes it inevitable to waste a large portion of samples;

This paper is organized as follows. Section 2 introduces the mathematical setting of the capital allocation problem and explains challenges on estimating VaR contributions with the existing estimators. Section 3 provides a brief introduction to the MCMC method and various MH algorithms. In section 4, we propose the MH estimator that combines the MH method with the estimation of VaR contributions. Next, in section 5, numerical studies are conducted based on simulation and real-world data. We demonstrate that for various risk models with marginal- and dependence-inhomogeneity and/or high-dimensionality, the MH estimator has smaller bias and mean squared error (MSE) than those of existing estimators. For applying our method to other risk models not presented in this paper, practical guidelines on the usage of the MH method are also provided. Concluding remarks and discussions are given in section 6. Based on the theory of MCMC, the consistency and asymptotic normality of our estimator are derived in appendix A

2 Capital allocation problem

Throughout this paper, the aggregate loss

[TABLE]

is considered, where $d\geq 3$ is the size of the portfolio, and $X_{1},X_{2},\dots,X_{d}$ are random variables on an atomless probability space $(\Omega,\mathcal{F},\mathbb{P})$ that represent the losses incurred by exposures $j=1,2,\dots,d$ within a fixed time period. In this study, a positive value of a loss random variable represents a financial loss, and a negative loss is interpreted as a profit. Let $F_{\bm{X}}$ be the joint distribution function (df) of $\bm{X}=(X_{1},X_{2},\dots,X_{d})$ with margins $F_{1},F_{2},\dots,F_{d}$ , and let $F_{S}$ be the df of the total loss $S$ . Assume that $F_{\bm{X}}$ and $F_{S}$ have densities $f_{\bm{X}}$ with marginal densities $f_{1},f_{2},\dots,f_{d}$ and $f_{S}$ , respectively. According to Sklar’s theorem (see, for example, Nelsen, 2006), it holds that

[TABLE]

where $C$ is called a copula of $\bm{X}$ . The density $f_{\bm{X}}$ can be written by

[TABLE]

where $c$ denotes the density of $C$ .

As mentioned in section 1, computing risk contributions is an important task in risk management. A standard procedure of determining risk contributions involves two steps. The first step is to compute the economic capital $\varrho(S)$ for a risk measure $\varrho$ . Risk measures map a loss random variable to a capital buffer that is required to cover the loss over a predetermined period such as one year or two weeks. One of the most popular risk measures is the VaR defined by $\text{VaR}_{p}(X)=\inf\{x\in\mathbb{R}:\mathbb{P}(X\leq x)\geq p\}$ where $p\in(0,1)$ is called the confidence level. Another popular measure is the expected shortfall (ES) defined by $\text{ES}_{p}(X)=\frac{1}{1-p}\int_{p}^{1}\text{VaR}_{q}(X)\text{d}q$ for $\mathbb{E}[|X|]<\infty$ . The second step is to allocate the capital $\varrho(S)$ to $d$ -exposures. Mathematically, capital allocation addresses the problem of determining the vector of allocated capitals $(\text{AC}_{1},\text{AC}_{2},\dots,\text{AC}_{d})$ that satisfies the full allocation property

[TABLE]

The Euler principle derives such AC’s by utilizing the well-known Euler rule for a function $\bm{u}\mapsto\varrho(\bm{u}^{\text{\scriptsize T}}\bm{X})$ :

[TABLE]

where $\Lambda\subset\mathbb{R}^{d}\backslash\{\bm{0}\}$ is an open set such that $\bm{1}_{d}\in\Lambda$ , and $\varrho$ is positive homogeneous, that is, $\varrho(\lambda X)=\lambda\varrho(X)$ for $\lambda>0$ . For

[TABLE]

the full allocation property (2) holds for the vector (AC ${}_{1}^{\varrho},\dots,$ AC ${}^{\varrho}_{d}$ ) by taking $\bm{u}=\bm{1}_{d}$ in equation (3).

Since VaRp is positive homogeneous, the Euler principle can be applied, and the corresponding risk contributions are given by

[TABLE]

We call the vector $\text{AC}^{\text{\scriptsize VaR}_{p}}:=(\text{AC}_{1}^{\text{\scriptsize VaR}_{p}},\dots,\text{AC}_{d}^{\text{\scriptsize VaR}_{p}})$ the VaR contributions. Since we mainly focus on this form of allocated capital in this study, we drop the superscript VaRp and write (5) as AC $=$ ( $\text{AC}_{1},\dots,\text{AC}_{d}$ ). Note that other forms of allocated capitals are also possible; for example, when the risk measure is ES, the ES contribution is derived as

[TABLE]

by positive homogeneity of ES; see Tasche (2001) for derivations of the last equalities in (5) and (6).

Even when the joint density of the portfolio loss vector $f_{\bm{X}}$ is given explicitly, the analytical computation of AC is not straightforward since it often requires the joint distribution of $(X_{j},S)$ , which is in general difficult to derive.

A possible numerical method to calculate VaR contributions is the crude MC method, in which the pseudo VaR contribution

[TABLE]

is computed for a sufficiently small bandwidth $\delta>0$ . Since the probability $\mathbb{P}(S\in[\text{VaR}_{p}(S)-\delta,\text{VaR}_{p}(S)+\delta])$ is positive, the right hand side of (7) can be written as

[TABLE]

This expression allows one to construct the estimator of the pseudo VaR contributions given by

[TABLE]

where $N>0$ is the sample size; $\bm{X}^{(1)},\dots,\bm{X}^{(N)}$ are independent and identically distributed (i.i.d.) samples from $F_{\bm{X}}$ ; $S^{(n)}:=X_{1}^{(n)}+\cdots+X_{d}^{(n)}$ are i.i.d. samples from $F_{S}$ for $n=1,\dots,N$ ; and $M_{\delta,N}:=\sum_{n=1}^{N}1_{[S^{(n)}\in A_{\delta}]}$ is the number of samples contained in $A_{\delta}$ . We call $(\ref{MC estimator})$ the MC estimator. By setting $\delta$ and $N$ as sufficiently small and large, respectively, one can expect that the MC estimator approximates the true VaR contributions. Note that this method is available only when $\delta$ is positive, since $\mathbb{P}(S\in A_{0})=\mathbb{P}(S=\text{VaR}_{p}(S))=0$ by continuity of $F_{S}$ .

As long as the i.i.d. samples from $F_{\bm{X}}$ can be generated, one can estimate ACδ by constructing the estimator (8). However, this estimator suffers from an inevitable bias The bias of the MC estimator can be decomposed by

[TABLE]

where $b_{\delta}(N)=\widehat{\text{AC}}^{\text{\scriptsize MC}}_{\delta,N}-\text{AC}_{\delta}$ and $b(\delta)=\text{AC}_{\delta}-\text{AC}$ . $\delta$ should be taken as small as possible to reduce $b(\delta)$ . However, when $\delta$ is quite small, it is difficult to ensure a large enough sample size $M_{\delta,N}$ to keep the first term $b_{\delta}(N)$ small since $\mathbb{E}[M_{\delta,N}]=N\mathbb{P}(S\in A_{\delta})$ , and $\mathbb{P}(S\in A_{\delta})$ is typically much less than $1-p$ .

To overcome this problem, several estimators have been proposed in the literature. First, Second, the NW kernel estimator proposed in Tasche (2009) is defined by

[TABLE]

where $\phi$ is the kernel density and $\Delta>0$ is the bandwidth. Since this estimator can be interpreted as a smoothing modification of the MC estimator (8) by kernel $\phi$ , it shares the same bias trade-off explained above. Furthermore, the bias and asymptotic standard deviation of the NW estimator (see, for example, Hansen, 2009) cannot be computed easily because they require an evaluation of the total loss density $f_{S}(s)$ at $s=\text{VaR}_{p}(S)$ . Finally, Hallerbach (2003) and Tasche and Tibiletti (2004) constructed estimators by assuming a regression model among the losses of the form:

[TABLE]

where $g_{\bm{\beta}}(s):\mathbb{R}\rightarrow\mathbb{R}^{d}$ is a function parameterized by $\bm{\beta}$ , and $\bm{\varepsilon}$ is an error random vector such that $\mathbb{E}[\bm{\varepsilon}|S=\text{VaR}_{p}(S)]=\bm{0}$ . For an estimator $\hat{\bm{\beta}}_{N}$ of $\bm{\beta}$ , we call the following estimator the GR estimator:

[TABLE]

Although this estimator is intuitive and can easily be computed, it is in general difficult to construct an appropriate model $g_{\bm{\beta}}$ and estimator $\hat{\bm{\beta}}_{N}$ of $\bm{\beta}$ , unless samples from $F_{\bm{X}|S=\text{VaR}_{p}(S)}$ are available. A notable exception is the case wherein $\bm{X}$ follows an elliptical distribution. In this case, the following result holds:

[TABLE]

see, for example, McNeil et al. (2015). The true VaR contributions are then provided by setting $g_{\beta}(s)=\beta_{0}+\beta_{1}s$ , where

[TABLE]

Since these coefficients are the minimizers of $\mathbb{E}[\bm{\varepsilon}^{2}]=\mathbb{E}[(\bm{X}-\beta_{0}-\beta_{1}S)^{2}]$ , the OLS estimators of $(\beta_{0},\beta_{1})$ are calculated based on the unconditional samples of $\bm{X}$ and $S$ converges to the true parameters (12) as $N\rightarrow\infty$ .

3 MCMC estimators

As seen in section 2, the essential problem in estimating VaR contributions is that the conditional samples from $F_{\bm{X}|S=\text{VaR}_{p}(S)}$ are unavailable. To solve this problem, we introduce the MCMC method wherein a given distribution is simulated by constructing a Markov chain whose stationary distribution is the desired one. By allowing Markovian-type dependence within the samples, the MCMC allows us to simulate a wide variety of distributions. In this section, we briefly review MCMC, especially the Metropolis-Hastings algorithm as a major subclass of MCMC methods.

3.1 A brief introduction to MCMC

Let $E\subseteq\mathbb{R}^{d}$ be a set and $\mathcal{E}$ be a $\sigma$ -algebra on $E$ . A Markov chain is a sequence of $E$ -valued random variables $(\bm{X}^{(1)},\bm{X}^{(2)},\dots)$ satisfying the Markov property;

[TABLE]

for all $n\geq 1$ , $A\in\mathcal{E}$ , and $\bm{x}^{(1)},\dots,\bm{x}^{(n)}\in E$ . A Markov chain is characterized by its stochastic kernel $K:E\times\mathcal{E}\rightarrow$ , given by $\bm{x}\times A\mapsto K(\bm{x},A):=\mathbb{P}(\bm{X}^{(n+1)}\in A|\bm{X}^{(n)}=\bm{x})$ . If there exists a probability distribution $\pi$ such that $\pi(A)=\int_{E}\pi(\text{d}\bm{x})(\bm{x},A)$ for any $\bm{x}\in E$ and $A\in\mathcal{E}$ , then $\pi$ is called the stationary distribution. See, for example, Nummelin (2004) for the general theory of Markov chain.

The MCMC method is widely used for simulating a distribution by generating a Markov chain with the given distribution as a stationary distribution $\pi$ . For some distribution $\pi$ and $\pi$ -measurable vector-valued function $\bm h$ on $E$ , our estimand is denoted as

[TABLE]

The MCMC estimator of (13) is given by

[TABLE]

where $(\bm{X}^{(1)},\dots,\bm{X}^{(N)})$ is a sample path from time $1$ to $N$ (we call it an $N$ -path) of a Markov chain whose stationary distribution is $\pi$ . The distribution $\pi$ is called the target distribution. Since it is determined by the problem at hand, the problem is to find a stochastic kernel $K$ such that it has the stationary distribution $\pi$ , and sample paths of its Markov chain can easily be generated.

One of the most popular stochastic kernels is the MH kernel defined by

[TABLE]

where

[TABLE]

$\delta_{\bm{x}}$ is the Dirac delta function; $q:E\times E\rightarrow\mathbb{R}_{+}$ is a function such that $\bm{x}\mapsto q(\bm{x},\bm{y})$ is measurable for any $\bm{y}\in E$ ; and $\bm{y}\mapsto q(\bm{x},\bm{y})$ is a probability density for any $\bm{x}\in E$ . This function $q$ is called a proposal density. It can be shown that the MH kernel has stationary distribution $\pi$ ; see Tierney (1994). Under the three conditions (i)–(iii) where (i) at least one vector $\bm{x}^{(0)}\in\text{supp}(\pi)$ is known, where $\text{supp}(\pi):=\{\bm{x}\in E:\pi(\bm{x})>0\}$ ; (ii) samples from $q(\bm{x},\cdot)$ can be generated for any $\bm{x}\in E$ ; and (iii) the ratio $\pi(\bm{y})/\pi(\bm{x})$ can be calculated for any $\bm{x},\bm{y}\in E$ , we can generate an $N$ -path of the desired Markov chain by the following MH algorithm:

Algorithm 1: (MH algorithm)

Fix a sample size $N>0$ , proposal density $q$ , and initial value $\bm X^{(0)}=\bm{x}^{(0)}\in\text{supp}(\pi)$ . 2. 2.

For $n=0,1,\dots,N-1$ , do: 3. 3.

Generate $\bm{X}_{\ast}^{(n)}\sim q(\bm{X}^{(n)},\hskip 2.84526pt\cdot\hskip 2.84526pt)$ and $U\sim\mathcal{U}(0,1)$ . 4. 4.

Set

[TABLE] 5. 5.

Set

[TABLE] 6. 6.

Return $(\bm{X}^{(1)},\dots,\bm{X}^{(N)})$ .

We call $\alpha_{n}:=\alpha(\bm{X}^{(n)},\bm{X}_{\ast}^{(n)})$ in $(\ref{acceptance probability})$ the acceptance probability at the $n$ th iteration. Based on the $N$ -path $(\bm{X}^{(1)},\dots,\bm{X}^{(N)})$ generated in Algorithm 1, the MCMC (MH) estimator $(\ref{mcmc estimator in general})$ is constructed.

Under regularity conditions, the MCMC estimator $\hat{{\bm\pi}}_{N}({\bm h})$ satisfies consistency and the central limit theorem (CLT). First, the MCMC estimator is consistent if

[TABLE]

for any $\pi$ -integrable function ${\bm h}$ and any initial state $\bm{X}^{(0)}=\bm{x}^{(0)}\in\text{supp}(\pi)$ . Next, CLT holds if

[TABLE]

where the asymptotic variance matrix is given by

[TABLE]

Since the asymptotic variance $(\ref{asymptotic variance})$ can rarely be computed in a real situation, it is estimated from the sample path $(\bm{X}^{(1)},\dots,\bm{X}^{(N)})$ generated in Algorithm 1. One popular estimator of ${\bf\Sigma}_{{\bm h}}$ is the so-called batch means estimator; see Geyer (2011). For an $N$ -path $(\bm{X}^{(1)},\dots,\bm{X}^{(N)})$ , the batch means estimator $\hat{{\bf\Sigma}}_{{\bm h},N}$ is defined by

[TABLE]

where $L_{N}$ and $B_{N}$ are positive integers satisfying $N=L_{N}B_{N}$ , and

[TABLE]

$L_{N}$ is called the batch length, and $B_{N}$ is the number of batches. Under regularity conditions, the batch means estimator $\hat{{\bf\Sigma}}_{{\bm h},N}$ converges to ${\bf\Sigma}_{{\bm h}}$ as $N\rightarrow\infty$ ; see Jones et al. (2006) and Vats et al. (2015). By using CLT of $\hat{{\bm\pi}}_{N}({\bm h})$ and the consistency of $\hat{{\bf\Sigma}}_{{\bm h},N}$ , one can construct an approximate confidence interval of the true quantity ${\bm\pi}({\bm h})$ based on an $N$ -path of the Markov chain.

3.2 Choice of the proposal distribution

When implementing the MH, an appropriate choice of the proposal function $q$ is necessary since it affects the asymptotic variance (18). Since ${\bf\Sigma}_{h}$ can rarely be calculated explicitly in a real situation, a post-implementation review is usually conducted; that is, the goodness of the selected proposal distribution is evaluated after performing the MH. In this section, we introduce two methods for evaluating the selected proposal distribution. We also provide some families of proposal distributions for later use.

In practice, there are two prevalent methods to determine the performance of the proposal distribution. One is to inspect the autocorrelation plots of the marginal sample paths. For an $N$ -path $(\bm{X}^{(1)},\dots,\bm{X}^{(N)})$ , vector-valued measurable function ${\bm h}(\bm{X})=(h_{1}(\bm{X}),\dots,h_{d}(\bm{X}))^{\text{\scriptsize T}}$ , and the MH estimator $\hat{{\bm\pi}}_{N}({\bm h})=(\hat{\pi}_{N,1},\dots,\hat{\pi}_{N,d})^{\text{\scriptsize T}}$ , the sample autocorrelations $\hat{r}_{j}(k):=\hat{R}_{j}(k)/\hat{R}_{j}(0)$ are drawn against the lag $k=0,1,2,\dots$ , where

[TABLE]

for $j=1,2,\dots,d$ . From the form of the asymptotic variance $\eqref{asymptotic variance}$ , one can expect that asymptotic variance ${\bf\Sigma}_{{\bm h}}$ is small if the autocorrelation plots steadily decline to zero as the lags increase. Another implicative quantity is the acceptance rate (ACR), which is the percentage of times a candidate $\bm{X}_{\ast}$ is accepted through the whole run.

Meanwhile, altering proposal distribution is generally suggested when extremely low or high ACR is observed.

Typically, proposal distribution $q$ is selected among certain classes of distributions. To find an appropriate $q$ depending on the target distribution, several classes of proposal distributions are presented in order. First, if the proposal function is of the form $q(\bm{x},\bm{y})=f(\bm{y}-\bm{x})$ for some density $f$ , the candidate $\bm{X}_{\ast}$ is drawn according to the following process:

[TABLE]

and $\bm{X}$ is the current state. This type of $q$ is called the random walk proposal distribution. In the case wherein $f$ is symmetric around the origin, the acceptance probability $(\ref{acceptance probability})$ is written simply as $\alpha(\bm{x},\bm{y})=\min\left[\frac{\pi(\bm{y})}{\pi(\bm{x})},1\right]$ . Second, when $q(\bm{x},\bm{y})=f(\bm{y})$ for some density $f$ , then candidate $\bm{X}_{\ast}$ is updated by

[TABLE]

This $q$ is called the independent proposal distribution. The two proposal distributions—random walk and independent—are widely used due to their simplicities. However, these proposal distributions often fail to perform well when the target distribution $\pi$ is heavy-tailed. To overcome this problem, the mixed preconditioned Crank-Nicolson (MpCN) proposal distribution is proposed by Kamatani (2014). This proposal distribution updates the candidate according to the following process:

[TABLE]

where $\rho\in(0,1)$ , $Z$ follows the gamma distribution with shape parameter $d/2$ and scale parameter $||{\bf\Sigma}^{-\frac{1}{2}}(\bm{X}-\bm{\mu})||^{2}/2$ , and $\bm{W}\sim{\mathcal{N}_{d}}(\bm{0},{\bf\Sigma})$ for some $d$ -vector $\bm{\mu}\in\mathbb{R}^{d}$ and $d\times d$ matrix ${\bf\Sigma}\in{\mathcal{M}}^{d\times d}_{+}$ . Throughout this paper, $\rho$ is set to be 0.8 as a default choice in Kamatani (2014). Ideally, $\bm{\mu}$ and ${\bf\Sigma}$ are set to be $\bm{\mu}=\mathbb{E}[\bm{X}]$ and ${\bf\Sigma}=\text{Var}[\bm{X}]$ , while in practice, they can be replaced by their rough estimates since moments of $\bm{X}$ are typically unknown. Note that the original MpCN proposed in Kamatani (2014) is the standardized version (that is, $\bm{\mu}=\bm{0}$ and ${\bf\Sigma}=\bm{I}_{d}$ , where $\bm{I}_{d}$ is an identity matrix). The acceptance probability (15) of the MpCN proposal distribution can be written as

[TABLE]

One of the key differences between this proposal distribution and the first two simple ones is that in the MpCN, not only the mean but also the variance of the candidate changes with the current state $\bm{X}$ . Since the MpCN proposal distribution admits larger jumps in the tail, a better acceptance rate can be expected even when $\pi$ is heavy-tailed.

4 The proposed method

In this section, we propose a new estimator of VaR contributions that utilizes the MCMC method, especially the MH algorithm, to achieve an efficient estimation. Theoretical study on the consistency and asymptotic normality of our MH-based estimator is provided in the Appendix for certain classes of risk models.

4.1 Assumptions and setup

We start by declaring assumptions under which our MH estimator is applicable.

Assumption 1.

On applying the MH estimator, we suppose the following:

(i)

an explicit form of the joint loss density $f_{\bm{X}}$ is given, and thus one can compute the quantity $f_{\bm{X}}(\bm{x})$ for any $\bm{x}\in\mathbb{R}^{d}$ ; 2. (ii)

a generator of i.i.d. samples from the loss distribution $F_{\bm{X}}$ is available; and 3. (iii)

neither the explicit form of total loss density $f_{S}$ nor the way to compute the quantity $f_{S}(\text{VaR}_{p}(S))$ is available.

Note that assumption 1 (ii) enables us to generate samples from $F_{S}$ by setting $S^{(n)}=X_{1}^{(n)}+\cdots+X_{d}^{(n)}$ where ( $X_{1}^{(n)},\dots,X_{d}^{(n)}$ ) is an $n$ th sample from $F_{\bm{X}}$ .

Such a situation typically occurs when the joint loss density $f_{\bm{X}}$ is specified through a copula density $c$ and marginal loss densities $f_{1},f_{2},\dots,f_{d}$ . The resulting joint loss density $f_{\bm{X}}$ is specified as in formula $(\ref{sklar theorem density form})$ .

As is mentioned in section 2, computing VaR contributions involves two steps; the first is to estimate VaR ${}_{p}(S)$ , and the second is to estimate VaR contributions AC $=\mathbb{E}[\bm{X}|S=\text{VaR}_{p}(S)]$ with VaR ${}_{p}(S)$ replaced by its estimate. The estimation of VaR ${}_{p}(S)$ in the first step is often conducted with an MC simulation. Based on i.i.d. samples ( $S^{(1)},\dots,S^{(N)}$ ) from $F_{S}$ , VaR ${}_{p}(S)$ can be estimated, for example, by $\widehat{\text{VaR}}_{p}(S)=S^{\lceil Np\rceil}$ , where $\lceil Np\rceil$ is the smallest integer greater than $Np$ , and $S^{\lceil Np\rceil}$ is the $\lceil Np\rceil$ th largest sample among $N$ samples. Since $\widehat{\text{VaR}}_{p}(S)$ is a deterministic quantity, one can regard it as a constant $v=\widehat{\text{VaR}}_{p}(S)$ .

In the second step, AC $=\mathbb{E}[\bm{X}|S=v]$ is estimated. According to the crude MC method, VaR contributions are estimated by (8). As explained in section 2, the problem of this two-step procedure is that the estimator of VaR contributions in the second step is typically biased. To address this issue, we develop an MCMC (MH)-based estimator that achieves consistency and high sample efficiency.

4.2 The MH estimator of VaR contributions

We propose to estimate VaR contributions by sequentially updating samples so that all samples lie in the set $\mathcal{S}_{v}=\{\bm{x}\in\mathbb{R}^{d}:x_{1}+\cdots+x_{d}=v\}$ . The updating rule is established so that the componentwise sum of each sample is preserved and the samples are taken from the distribution $F_{\bm{X}|S=v}$ . We start to describe the MH-based estimator by reformulating the problem of computing VaR contributions.

By the full allocation property $(\ref{full allocation property})$ , it holds that

[TABLE]

where $S=X_{1}+\cdots+X_{d}$ . Therefore, computation of VaR contributions AC= $\mathbb{E}[\bm{X}|S=v]$ can be reduced to estimate VaR contributions of the $d^{\prime}$ -subportfolio, denoted by AC ${}^{\prime}=\mathbb{E}[\bm{X}^{\prime}|S=v]$ . In our method, this quantity AC*′* is estimated by generating samples directly from $F_{\bm{X}^{\prime}|S=v}$ . The conditional joint density of $\bm{X}^{\prime}$ given $\{S=v\}$ can be written as

[TABLE]

where the last equation follows from a linear transformation $(\bm{X}^{\prime},S)^{\text{\scriptsize T}}\mapsto\bm{X}$ . At this point, sampling directly from $f_{\bm{X}^{\prime}|S=v}$ is difficult since the total loss density $f_{S}(v)$ is not easy to evaluate.

By taking $E=\mathbb{R}^{d^{\prime}}$ , $h(\bm{x})=\bm{x}$ , and $\pi(\bm{x})=f_{\bm{X}^{\prime}|S=v}(\bm{x})$ in the notations presented in subsection 3.1, our problem of estimating VaR contributions can be reduced to estimate ${\bm\pi}({\bm h})=\mathbb{E}[\bm{X}^{\prime}|S=v]$ in (13) by MCMC. Even though it is challenging to compute $f_{\bm{X}^{\prime}|S=v}$ , we can compute the acceptance probability (15) given by

[TABLE]

for any $\bm{x},\bm{y}$ by assumption 1 (i). Note that the term $f_{S}(v)$ disappears by taking the ratio of $f_{\bm{X}^{\prime}|S=v}(\bm{y})$ to $f_{\bm{X}^{\prime}|S=v}(\bm{x})$ . Therefore, under an appropriate choice of the proposal density $q$ , the MH algorithm (algorithm 1) allows one to generate $N$ -paths of the Markov chain whose stationary distribution is $\pi(\bm{x})=f_{\bm{X}^{\prime}|S=v}(\bm{x})$ . Based on a sample path, we can construct the MH estimator $\hat{{\bm\pi}}_{N}({\bm h})$ defined by $(\ref{mcmc estimator in general})$ . The algorithm to compute the MH estimator of VaR contributions is summarized in the following algorithm.

Algorithm 2: (MH estimator of VaR contributions $\mathbb{E}[\bm{X}|S=\text{VaR}_{p}(S)]$ )

Estimate VaR as $v=\widehat{\text{VaR}}_{p}(S)$ by MC samples. 2. 2.

Fix the sample size $N>0$ , proposal distribution $q$ , and initial value $\bm{X}^{(0)}=\bm{x}^{(0)}\in\text{supp}(f_{\bm{X}^{\prime}|S=v})$ . 3. 3.

Perform Algorithm 1 (MH) for the given $N$ , $q$ , and $\bm{x}^{(0)}$ to generate an $N$ -path $(\bm{X}^{\prime(1)},\dots,\bm{X}^{\prime(N)})$ . 4. 4.

Set

[TABLE]

to estimate VaR contributions AC $=\mathbb{E}[\bm{X}|S=v]$ .

Moreover, under regularity conditions, consistency and asymptotic normality of the MH estimator (23) hold; see Appendix 7 for more detail.

*Remark 1** (MCMC methods for ES contributions).*

5 Numerical experiments

In this section, we apply the MH estimator proposed in section 4 to various risk models, and compare its performance with other existing estimators of VaR contributions. Our simulation and empirical study based on real-world data show that the MH estimator has smaller bias and lower MSE compared with other estimators for many situations, including high-dimensional ( $d\approx 500$ ) cases. Based on the numerical experiments, we provide practical guidelines on how to choose an appropriate proposal distribution of the MH estimator given a risk model. In the experiments, we used a MacBook Air, 1.4 GHz Intel Core i5, 4 GB 1600 MHz DDR3.

5.1 Simulation study

5.1.1 Description of the numerical comparison

In the simulation study, we consider four risk models that are modelled separately by marginal densities and copula density. We adopt heavy-tailed marginal distributions and copulas with upper-tail dependences as they are often of concern in risk management. In all risk models, we set the size of the portfolio $d=3$ . The models are set as follows.

(1)

The loss random variables $X_{1},X_{2}$ , and $X_{3}$ follow homogeneous Pareto distributions (29) for $\kappa=4$ and $\gamma=3$ . The loss random vector $(X_{1},X_{2},X_{3})$ has a $d$ -dimensional survival Clayton copula (30) with $\theta=0.5$ . 2. (2)

$X_{1},X_{2}$ , and $X_{3}$ have the same marginal distributions as in case (1). Their copula is a student’s $t$ -copula (33) with the degree of freedom $\nu=4$ , and the dispersion matrix ${\bf P}$ given by

[TABLE] 3. (3)

$X_{1},X_{2}$ , and $X_{3}$ follow homogeneous student’s $t$ -distribution with degree of freedom $\nu=4$ , location parameter $\mu=0$ , and scale parameter $\sigma=1$ . $(X_{1},X_{2},X_{3})$ has a survival Clayton copula (30) with $\theta=0.5$ . 4. (4)

$(X_{1},X_{2},X_{3})$ follows a multivariate student’s $t$ -distribution (35) with $\nu=4$ , $\bm{\mu}=\bm{0}$ , and ${\bf\Sigma}={\bf P}$ where ${\bf P}$ is defined in $(\ref{correlation matrix})$ .

The first two models (1) and (2) consider pure losses, and the last two, (3) and (4), consider Profit&Loss. In all models, marginal distributions have variances of 2.0 and heavy tails with tail indices $5.0$ . Models (1) and (3) possess homogeneous upper-tail dependences with tail coefficients $\lambda^{U}=0.025$ ; see Joe (2014) for formulas on the tail coefficients. Models (2) and (4) have upper, lower, and upper-lower tail dependences with tail coefficients $\lambda_{1,2}^{U}=\lambda_{1,2}^{L}=0.012$ , $\lambda_{1,3}^{U}=\lambda_{1,3}^{L}=0.162$ , $\lambda_{2,3}^{U}=\lambda_{2,3}^{L}=0.253$ , $\lambda_{1,2}^{UL}=\lambda_{1,2}^{LU}=0.253$ , $\lambda_{1,3}^{UL}=\lambda_{1,3}^{LU}=0.029$ , and $\lambda_{2,3}^{UL}=\lambda_{2,3}^{LU}=0.012$ . As inferred by the dispersion matrix $(\ref{correlation matrix})$ , the first and second losses are negatively dependent, while other pairs of losses are positively dependent.

For each risk model, we compute several estimators of VaR contributions $\text{AC}=\mathbb{E}[\bm{X}|S=\text{VaR}_{p}(S)]$ for a confidence level $p=0.999$ with the $\text{VaR}_{p}(S)$ replaced by its Monte Carlo estimate $v=S^{[Np]}$ . The estimators we compare are the MC estimator $(\ref{MC estimator})$ , NW estimator $(\ref{nw estimator})$ (Tasche, 2009), GR estimator $(\ref{gr estimator})$ (Hallerbach, 2003; Tasche and Tibiletti, 2004), and the MH estimator $(\ref{MCMC estimator})$ :

[TABLE]

where $\bm{X}^{(1)},\dots,\bm{X}^{(N)}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}F_{\bm{X}}$ , $S^{(n)}:=X_{1}^{(n)}+\cdots+X_{d}^{(n)}$ and $(\bm{X}^{(1)}_{|S=v},\dots,\bm{X}^{(N)}_{|S=v})$ is an $N$ -path of a Markov chain whose stationary distribution is $F_{\bm{X}|S=v}$ .

For all estimators, we fix the sample size $N=10^{6}$ . Other parameters of the estimators are determined as follows. First, for the MC estimator, we set $\delta>0$ such that the MC sample size $M_{\delta,N}$ is around $10^{3}$ . For a fixed $\delta$ , asymptotic normality

[TABLE]

holds; see, for example, Glasserman (2013). We report the estimate of $\widehat{\text{AC}}^{\text{\scriptsize MC}}_{\delta,N}$ and its approximated standard error ${\bm S}_{MC}^{(j,j)}/\sqrt{M_{\delta,N}}$ for $j=1,2,3$ , where ${\bm S}_{MC}^{(i,j)}$ is the $(i,j)$ -component of the sample standard deviation ${\bm S}_{MC}$ defined by

[TABLE]

Second, for the NW estimator, we choose the kernel density $\phi$ as the standard normal density. We decide the bandwidth $\Delta=1.06\hat{\sigma}_{S}N^{-1/5}$ according to Silverman’s rule of thumb (Pagan and Ullah, 1999). Although asymptotic normality holds for the NW estimator, its asymptotic variance can hardly be computed because it requires the evaluation of $f_{S}(v)$ . Therefore, we report only the estimate of $\widehat{\text{AC}}^{\text{\scriptsize NW}}_{\phi,h,N}$ . Third, for the GR estimator, we choose $g_{\bm{\beta}}(s)=\beta_{0}+\beta_{1}s$ , and its coefficients are estimated by

[TABLE]

where $\bar{\bm{X}}_{N}=\frac{1}{N}\sum_{n=1}^{N}\bm{X}^{(n)}$ and $\bar{S}_{N}=\frac{1}{N}\sum_{n=1}^{N}S^{(n)}$ . Under regularity conditions, the asymptotic normality

[TABLE]

holds for $j=1,2,3$ , where $\hat{\beta}_{N,k}^{(j)}$ and $\beta_{k}^{(j)}$ are the $j$ th components of $\hat{\bm{\beta}}_{N,k}$ and $\bm{\beta}_{k}$ , respectively, for $k=1$ and $2$ ; $\varepsilon_{j}$ is the $j$ th component of the error term $\bm{\varepsilon}$ ; $\sigma_{\varepsilon_{j}}^{2}$ is the conditional variance of $\varepsilon_{j}$ given $S_{(1)},\dots,S_{(N)}$ ; and

[TABLE]

Based on these results, we report the estimate of $\widehat{\text{AC}}^{\text{\scriptsize GR}}_{g_{\bm{\beta}},N}$ and its approximated standard error ${\bf S}_{GR}^{(j)}/\sqrt{N}$ for $j=1,2,3$ , where

[TABLE]

$\hat{{\bf\Sigma}}_{GR,j}=\hat{\sigma}_{\varepsilon_{j}}^{2}\cdot({\bf Y}^{\text{\scriptsize T}}{{\bf\bm{Y}}}/N)^{-1}$ , and $\hat{\sigma}_{\varepsilon_{j}}$ is the sample standard deviation of the $j$ th residuals. Finally, for the MH estimator, we choose different proposal distributions depending on risk models (1)–(4). For each risk model, we choose (1) a random walk proposal $q(\bm{x},\bm{y})=f(\bm{y}-\bm{x})$ with $f\sim\mathcal{N}_{d}(\bm{0},\hat{{\bf\Sigma}}_{v})$ , where $\hat{{\bf\Sigma}}_{v}:={\bf S}^{2}_{MC}$ ; (2) an independent proposal $q(\bm{x},\bm{y})=f(\bm{y})$ , where $f$ is the density of the Dirichlet distribution with parameters , (3) and (4) MpCN proposal with $\rho=0.8$ , $\bm{\mu}=(\widehat{\text{AC}}^{\text{\scriptsize GR}}_{g_{\bm{\beta}},N}\text{}^{\prime},v-\bm{1}^{\text{\scriptsize T}}_{d^{\prime}}\widehat{\text{AC}}^{\text{\scriptsize GR}}_{g_{\bm{\beta}},N}\text{}^{\prime})^{\text{\scriptsize T}}$ , and ${\bf\Sigma}:={\bf S}^{2}_{MC}$ . In Algorithm 2, we set the initial values as $\bm{x}^{(0)}=(v/3,v/3,v/3)^{\text{\scriptsize T}}$ . We estimate the asymptotic variances of MH estimators by the batch means estimators $\hat{{\bf\Sigma}}_{N}$ defined by $(\ref{batch mean estimator})$ . Following Jones et al. (2006), we choose $L_{N}:=\lfloor N^{\frac{1}{2}}\rfloor=10^{3}$ and $B_{N}:=\lfloor N/L_{N}\rfloor=\lfloor N^{\frac{1}{2}}\rfloor=10^{3}$ . We report the estimate of $\widehat{\text{AC}}^{\text{\scriptsize MCMC}}_{q,N}$ and its approximated standard error $\hat{{\bf\Sigma}}_{N}^{(j,j)}/\sqrt{N}$ for $j=1,2,3$ .

5.1.2 Results and discussions

Due to the simplicity of MC, NW, and GR estimators, they were calculated instantly for all risk models.

As mentioned in section 3.2, the validity of the proposal selection can be inspected by autocorrelation plots and ACR. Figure 3 (v)–(viii) shows the autocorrelation plots of the Markov chains generated by Algorithm 2. The acceptance rate of the MH algorithm in each risk model was (1) 0.566, (2) 0.222, (3) 0.604, and (4) 0.767. In Figure 3 (v)–(viii), we can observe that the autocorrelation plots steadily decline below 0.1 by lag $h$ around 100 for all risk models. Together with the observations that the ACRs are moderate, we could state that the choices of the proposal distributions above are appropriate for all risk models.

Before showing the results of the estimation, let us check the shapes of the conditional distributions $F_{\bm{X}^{\prime}|S=v}$ by plotting the $N$ -path generated by Algorithm 2. Figure 3 (i)–(iv) shows the contour plots of the generated Markov chains. According to these plots, the features of the conditional distribution $F_{\bm{X}^{\prime}|S=v}$ in each risk model are summarized as follows:

(1)

Pareto + survival Clayton: The contour plot in Figure 3 (i) shows that $F_{\bm{X}^{\prime}|S=v}$ has a unique mode. The density steadily decays as they move away from the mode. In addition, the contour plot seems symmetric at the diagonal line. 2. (2)

Pareto + $t$ -copula: Unlike case (1), $F_{\bm{X}^{\prime}|S=v}$ seems to possess two distinct modes close to the axes. High probabilities are concentrated around the edges of the simplex. Along with the axes, the gradients of the density seem to be sharp. Moreover, the contour plot in Figure 3 (ii) is asymmetric at the diagonal line $y=x$ . 3. (3)

Student’s $t$ + survival Clayton: Although the conditional loss random vector $\bm{X}^{\prime}|S=v$ can take negative values, it is supported mostly on the bounded simplex as in case (1). The contour plot in Figure 3 (iii) seems unimodal and symmetric around the diagonal. The tails of $F_{\bm{X}^{\prime}|S=v}$ are obviously light. 4. (4)

Student’s $t$ + $t$ -copula: In this case, the conditional distribution $F_{\bm{X}^{\prime}|S=v}$ can be shown to be a Pearson type VII distribution $(\ref{pearson type vii})$ in Appendix 7. From the contour plot in Figure 3 (iv), we can observe elliptical symmetry and tail-heaviness. Unlike case (3), the loss vector $\bm{X}^{\prime}|S=v$ can take large negative values beyond the bounded simplex.

The results of estimation are summarized in Table 1. In the four different risk models (1)–(4), we report the estimates, their approximated standard errors, biases, and rooted MSEs (RMSEs) of the four different estimators: MC, NW, GR, and MH.

In the first risk model, true VaR contributions are obtained by equally allocating the total VaR since the marginal distributions are homogeneous and the copula is exchangeable. We observed that the MC and NW estimators have relatively larger biases than those of others. Compared with the MH estimator, the GR estimator still suffers from some inevitable bias although its standard error is quite small. The MC estimator has a relatively large standard error due to sample inefficiency. Overall, the MH estimator outperforms all other estimators in terms of RMSE.

The second risk model does not allow us to analytically calculate the true VaR contributions. Therefore, the true VaR contributions are computed by Monte Carlo numerical integration, which still works with enough accuracy for dimension three. We can observe that existing estimators suffer from biases possibly caused by asymmetry and multi-modality of the conditional distribution $F_{\bm{X}^{\prime}|S=v}$ . In particular, the GR estimator has relatively large bias and RMSE in contrast to the good performance in the first risk model. On the other hand, the MH estimator maintains lower bias and RMSE compared with the other estimators.

In the third risk model, the true VaR contributions are given by the equal allocation based on the same discussion as in case (1). Thanks to the symmetry and unimodality of the conditional distribution $F_{\bm{X}^{\prime}|S=v}$ , all estimators retain small biases and RMSEs. Together with the results in cases (1) and (2), one can state that the GR estimator performs well so long as $F_{\bm{X}^{\prime}|S=v}$ is symmetric and unimodal. Additionally, the MH estimator reduces bias and RMSE compared with those of MC and NW estimators.

The final risk model provides the true VaR contributions via the formula $(\ref{VaR contributions elliptical case})$ . In such an elliptical case, the GR estimator provides quite an accurate estimate. Although the conditional distribution $F_{\bm{X}^{\prime}|S=v}$ is heavy-tailed as seen in Figure 3 (iv), the MH estimator retains high performance compared with the MC and NW estimators. The bias of the MH estimator is significantly improved compared with the MC and NW estimators. Moreover, its standard error and RMSE are lower than those of the MC estimator.

Throughout the numerical study, the MH estimator provided small bias and RMSE regardless of the shape of the conditional distribution $F_{\bm{X}^{\prime}|S=v}$ . In the case when $F_{\bm{X}^{\prime}|S=v}$ is unimodal and symmetric, the GR estimator also guarantees a good performance. On the other hand, at least in our numerical experiment, the MC and NW estimators had relatively larger biases and RMSEs compared with other estimators.

5.2 Empirical study

The numerical study is now extended to a high-dimensional case with real-world data. We used the dataset stockdata in R-package huge, which consists of stock market data of closing prices from all stocks in the S&P 500 for all the days the market was open in the period of January 1, 2003 to January 1, 2008 (five years). During the time period, there remained $d=452$ stocks in the S&P 500. The sample size is $T=1258$ . We transformed the data into the log-ratio of the price at time $t$ to the price at time $t-1$ .

Most stylized facts on stock returns listed in Chapter 3 of McNeil et al. (2015) are observable in the data. For example, return series are unimodal, leptokurtic, and heavy-tailed with little serial correlation and volatility clusters. Moreover, the $d$ return series are mutually dependent. Taking these observations into account, we adopted a copula-GARCH model with skew- $t$ white noise (ST-GARCH; see, for example, Jondeau and Rockinger, 2006; Huang et al., 2009). In the model, $d$ marginal time series are modelled by GARCH $(1,1)$ and the underlying white noise processes follow skew- $t$ distributions with an inhomogeneous degree of freedom $\nu_{j}>0$ and skewness parameter $\gamma_{j}>0$ ; that is, within a fixed time period $\{1,\dots,T\}$ the $j$ -th return series $(X_{1,j},\dots,X_{T,j})$ follows

[TABLE]

for $t=2,\dots,T,\quad j=1,\dots,d$ , where $\omega_{j}>0,\alpha_{j},\beta_{j}\geq 0$ , $\alpha_{j}+\beta_{j}<1$ , and $Z_{t,j}$ follows a skew- $t$ distribution $\text{ST}(\nu_{j},\gamma_{j})$ with density given by

[TABLE]

where $t(x,\nu)$ is a probability density function of a student’s $t$ -distribution with degree of freedom $\nu>0$ and a skewness parameter $\gamma>0$ with $\gamma=1$ symmetric; see Fernández and Steel (1998) for more detail. The copula among $\bm{Z}_{t}=(Z_{t,1},\dots,Z_{t,d})$ is assumed to be a student’s $t$ -copula with parameters $\nu$ and ${\bf P}$ independent of time $t$ .

We estimated parameters of the ST-GARCH(1,1) model with the $t$ -copula based on the copula approach. First, we fitted the ST-GARCH(1,1) model with the maximum likelihood method to the marginal time series. Then, to obtain pseudo-samples from the copula of $\bm{Z}$ , distributional transform was applied to the $d$ -dimensional white noise process extracted from the ST-GARCH model. We finally fit the $t$ -copula to them with method-of-moments using Kendall’s tau for the dispersion matrix ${\bf P}$ and the maximum likelihood method for the degree of freedom $\nu$ ; see Demarta and McNeil (2005) for more detail. The results of the estimation are summarized in Figure 4 with each boxplot representing $d$ numbers of each parameter. From (B1) and (B5), the estimates of means and omegas are almost [math]. From (B3), most of the marginal white noise distributions are symmetrical but some are skewed. From (B4), their degrees of freedom range from two to ten, that is, the tail-heaviness of the return series is inhomogeneous over $d$ assets. Finally, (B8) shows that the pairwise correlations among the return series are typically from 0.2 to 0.4, and some have strong positive correlations.

Our goal in this study is to compute the conditional VaR contributions at time $T+1$ given the history $\mathcal{F}_{t}$ . Under the model described above, the marginal distribution of the $j$ -th return at time $T+1$ is $X_{T+1,j|\mathcal{F}_{T}}\sim$ ST $(\mu_{j},\sigma_{t+1,j}^{2},\nu_{j},\gamma_{j})$ , where ST $(\mu_{j},\sigma_{t+1,j}^{2},\nu_{j},\gamma_{j})$ is a skew $t$ -distribution with density $f_{j}(\frac{x_{j}-\mu_{j}}{\sigma_{t+1,j}};\nu_{j},\gamma_{j})$ with $f_{j}(\cdot;\nu_{j},\gamma_{j})$ defined in (25). Their copula is a student’s $t$ -copula with parameters $\nu$ and $P$ . Based on this multivariate model, conditional VaR contributions at time $T+1$ given histories $\mathcal{F}_{t}$ are estimated by the same procedure as in section 5.1.

We estimated the conditional VaR contributions (AC ${}_{1}^{T+1},\dots,$ AC ${}_{d}^{T+1}$ ) with confidence level $p=0.999$ by using MC, NW, GR, and MH methods. In MC, $N=10^{5}$ samples were generated and the total VaR was estimated as the $Np$ -th largest sample among them. The run time of the MC simulation was $2.690$ minutes. The MC estimates of VaR contributions were then computed as sample means of the conditional samples whose sums fall into the set $A_{\delta}=[v-\delta,v+\delta]$ . The bandwidth was set to be $\delta=4.8$ so that there were $M_{\delta,N}=733$ conditional MC samples. Estimates of standard errors were also computed based on these samples. NW, GR, and MH estimators were computed analogously to the previous simulation study in section 5.1. For the MH estimator, the MpCN proposal distribution was chosen since the target distribution was expected to be heavy-tailed and elliptical to some extent. The length of the sample path was chosen to be $N=10^{4}$ , and the run time of the MH algorithm was $5.487$ minutes. We inspected the autocorrelation plots and ACR to check the validity of the proposal distribution. We observed that all autocorrelations decreased below $0.1$ if lags are larger than $40$ . Together with the ACR $0.983$ , we concluded that the choice of $q$ was appropriate.

Figure 5 shows the MC, NW, GR, and the MH estimates of the conditional VaR contributions (AC ${}_{1}^{T+1},\dots,$ AC ${}_{d}^{T+1}$ ) of returns at time $T+1$ given histories $\mathcal{F}_{T}$ plotted with the homogeneously allocated capitals VaR ${}_{p}(S|\mathcal{F}_{T})/d$ and the standardized marginal VaRs defined by $\text{VaR}_{p}(X_{T+1,j}|\mathcal{F}_{T})\Delta_{p}(\bm{X}_{T+1}|\mathcal{F}_{T})$ , where $\Delta_{p}(\bm{X}_{T+1}|\mathcal{F}_{T})$ is the so-called superadditivity ratio defined by

[TABLE]

For MC, GR, and MH estimators, the 95% confidence upper and lower bounds are also plotted. On the x-axis, the 452 assets are rearranged in increasing order of MH estimators.

Compared with the dotted line representing the homogeneous allocation, all the estimated allocated capitals show inhomogeneity among assets. Overall, the estimated VaR contributions are less volatile than the standardized marginal VaRs, which implies the benefit of the diversification effect. We can also observe that the MH estimates and GR estimates almost coincide for all $d$ assets. The confidence intervals of both estimators are much tighter than that of the MC estimator. NW estimates fluctuate around the line of the MH and GR estimates. On the other hand, the MC estimates deviate from these lines, which indicates that the MC estimators contain inevitable biases. In summary, although the true ACs are unknown, the GR and MH estimators retain stable performance compared with the MC and NW estimators even if the dimension $d$ is large and marginal distributions are inhomogeneous.

5.3 Advantages and disadvantages of the MH estimator

We summarize the advantages and disadvantages of the MH estimator compared with the other estimators. The first advantage is that the MH estimator is consistent whereas this is not always true for the other estimators. As explained in section 2, the MC, NW, and GR estimators have biases which cannot be easily eliminated. In fact, we observed in Table 1 that unignorable biases of the MC, NW, and GR estimators sometimes remain even when their standard errors are sufficiently small. In contrast, the MH estimator provides more accurate estimates of VaR contributions as $N\rightarrow\infty$ due to its consistency. Since CLT also holds, the confidence interval of the true VaR contributions is also available. Secondly, the MH estimator has great sample efficiency compared with the MC estimator. While samples are generated from $F_{\bm{X}}$ and most are discarded in the MC method, no samples are wasted in the MH method since it directly simulates $F_{\bm{X}|S=v}$ . Consequently, the MH estimator can achieve low standard errors. Finally, the MH estimator can maintain high performance even when the conditional distribution $F_{\bm{X}^{\prime}|S=v}$ is multimodal or heavy-tailed. As discussed in subsection 5.1.2, the performance of the GR estimator highly depends on the shape of $F_{\bm{X}^{\prime}|S=v}$ . On the other hand, for the MH estimator, the shape of $F_{\bm{X}^{\prime}|S=v}$ can be directly captured through the proposal distribution $q$ . By choosing an appropriate proposal distribution $q$ according to the shape of $F_{\bm{X}^{\prime}|S=v}$ , the MH estimator can attain great performance. This advantage, however, can be seen as a disadvantage from the viewpoint of the simplicity of estimation. In general, estimation with MH requires two steps: first is to choose a family of proposal distribution, and the second is to determine its parameters. The second step of parameter estimation can be based on the MC samples falling into set $A_{\delta}$ , which is regarded as the pseudo samples from $F_{\bm{X}|S=v}$ . Meanwhile, the first step is not so straightforward. We will discuss this issue in the next subsection 5.4. Another disadvantage of the MH estimator is that it typically requires a longer run time than other existing estimators. Since MH requires $N$ times of simulating the proposal distribution and evaluating the acceptance probability $\eqref{acceptance probability}$ , careful programming and proposal selection are necessary to save computational time.

5.4 Guidelines for the choice of proposal distribution

A significant drawback of the MH estimator is that the choice of an appropriate proposal distribution $q$ is not as simple as the parameter selections of other existing estimators. An instruction for proposal selection is necessary since it highly affects the performance of the MH. In this subsection, we first investigate the symptoms caused by an inappropriate choice of $q$ . Then, we consider how to overcome these problems based on the numerical experiments provided above. Practical guidelines for choosing an appropriate proposal distribution are also provided.

An inappropriate choice of $q$ is largely classified into two cases. One is that proposal distribution $q$ often generates a candidate of which the probability measured by $\pi$ is quite small. This case occurs, for example, when $q$ does not fully capture the shape of $\pi$ . In such a case, the Markov chain moves quite slowly and this yields a high asymptotic standard error of the MH estimator. This symptom appears as quite a low acceptance rate and high autocorrelations. Another case is wherein $q$ generates only some parts of the whole support of $\pi$ . This case occurs, for example, when $\pi$ has distinct local modes and the variance of $q$ is so small that the chain cannot pass between ridges. In such a case, an estimate can be significantly biased, although the acceptance rate and autocorrelation plots are seemingly perfect. This symptom appears as a distorted plot of MCMC samples whose shapes are completely different from the target distribution $\pi$ .

How can we detect and avoid such fallacious estimates? First, as mentioned in section 3.2, it is indispensable to inspect the autocorrelation plots and ACR to prevent the first symptom. Additionally, to avoid the second symptom, we recommend drawing the plots of the generated Markov chain and comparing them with the plots of the MC samples whose componentwise sums belong to $A_{\delta}=[v-\delta,v+\delta]$ . Since such MC samples follow the distribution $F_{\bm{X}|S\in A_{\delta}}$ , one can detect the distortion of the generated Markov chain by comparing the two scatter plots of $F_{\bm{X}|S=v}$ and $F_{\bm{X}|S\in A_{\delta}}$ . As an example from our simulation study in subsection 5.1, Figure 6 shows the scatter plots of the MC samples whose sums belong to $[v-\delta,v+\delta]$ overlaid on the scatter plots of the MH samples. In the figure, we can check that the shapes of the scatter plots of the MH samples bear striking resemblance to those of the MC samples for all risk models. If some part of the support of $\pi$ is covered by the MC samples but not by the MH samples, the choice of $q$ is questionable.

Finally, through numerical experiments we found that dependence information of the underlying risk model can be helpful for the selection of $q$ . When copula $C$ of the underlying risk model only has positive dependences for all pairs of loss variables, then the conditional distribution $F_{\bm{X}|S=v}$ is likely to be unimodal and light-tailed since positive dependence among $X_{1},X_{2},\dots,X_{d}$ prevents them from being diversified under the constraint $\{X_{1}+\cdots+X_{d}=v\}$ . In risk models (1) and (3) in subsection 5.1.1, where copula $C$ has only positive dependences, the contour plots in Figures 3 (i) and (iii) show that $F_{\bm{X}^{\prime}|S=v}$ is unimodal and light-tailed. These features facilitate the estimation with MH since simple proposal distributions such as the random walk proposal (20) and independent proposal $(\ref{independent proposal})$ can perform well. Conversely, when copula $C$ has negative dependence, $F_{\bm{X}^{\prime}|S=v}$ tends to be multimodal or heavy-tailed since negative dependence allows each component of $\bm{X}$ to take extreme values under $\{X_{1}+\cdots+X_{d}=v\}$ . In risk models (2) and (4) in subsection 5.1.1, where copula $C$ has negative dependences, Figure 3 (ii) indicates that $F_{\bm{X}^{\prime}|S=v}$ is bimodal, and the contour plot in Figure 6 (d) shows that $F_{\bm{X}^{\prime}|S=v}$ is heavy-tailed. In such cases, careful proposal selection is required for achieving an efficient MH estimator. When the losses $X_{1},X_{2},\dots,X_{d}$ are all nonnegative, then $F_{\bm{X}^{\prime}|S=v}$ is supported on the bounded simplex $\mathcal{S}_{v}$ defined in (26). Therefore, one can cover the whole support of $F_{\bm{X}^{\prime}|S=v}$ by choosing $q$ as the independent proposal with the distribution defined on the simplex. Uniform distribution on $\mathcal{S}_{v}$ can be the safest choice. It is also possible to choose other distributions that share the same features of $F_{\bm{X}^{\prime}|S=v}$ observed in the MC samples. For instance, since bimodality is observed in the contour plot in Figure 6 (b), we choose $q$ as the independent proposal distribution with $f$ the Dirichlet distribution on $\mathcal{S}_{v}$ , which can possess two distinct modes around the edges of the simplex. When $\bm{X}$ is $\mathbb{R}^{d}$ -valued and negatively dependent, an efficient MCMC is challenging since the target distribution $F_{\bm{X}^{\prime}|S=v}$ is likely to be multimodal or heavy-tailed. As a special case, when $F_{\bm{X}}$ is elliptical to some extent, then $F_{\bm{X}^{\prime}|S=v}$ is likely to be elliptical again. In such a case, even if it is heavy-tailed, the MpCN proposal distribution $(\ref{MpCN proposal})$ is known to perform well, which is also demonstrated by the simulation study of the risk model (4) in subsection 5.1.1 and by the empirical study in subsection 5.2.

The discussions on choosing an appropriate proposal distribution are summarized as a flowchart in Figure 7. Together with the guidelines, the whole procedure of our MH estimator of VaR contributions presented in this paper is summarized as follows.

Algorithm 3: (Estimation of VaR contributions with MCMC)

Generate $\bm{X}_{1},\bm{X}_{2},\dots,\bm{X}_{M}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}F_{\bm{X}}$ by MC. 2. 2.

Based on the samples generated in step 1, estimate VaR by $v=\widehat{\text{VaR}}_{p}(S)$ . 3. 3.

For a bandwidth $\delta>0$ , extract subsamples such that $\bm{1}^{\text{\scriptsize T}}_{d}\bm{X}_{m}\in[v-\delta,v+\delta]$ for $m=1,\dots,M$ . 4. 4.

Choose a family of proposal distributions according to the guideline in Figure 7. 5. 5.

Based on the pseudo samples extracted in step 3, determine the parameters of the proposal distribution $q$ . 6. 6.

For a sample size $N>0$ , proposal density $q$ and the initial value $\bm X^{(0)}=\bm{x}^{(0)}$ , run Algorithm 1 to generate an $N$ -path $(\bm{X}^{(1)},\dots,\bm{X}^{(N)})$ of a Markov chain whose stationary distribution is $f_{\bm{X}|S=v}$ . 7. 7.

To check the validity of proposal distribution $q$ , compute the acceptance rate, draw the autocorrelation plots, and compare the scatter plots of the MC and MH samples. 8. 8.

If the proposal selection is verified in step 7, set the MH estimator of VaR contributions $\eqref{MCMC estimator}$ based on the sample path generated in step 6. Otherwise, go to step 4 and choose another proposal distribution.

6 Concluding remarks

Computing VaR contributions for a risk model specified by joint density is generally a difficult task. To achieve this, the MH estimator of VaR contributions is proposed. Its sample efficiency is significantly improved since the MH method generates samples directly from the conditional density given the sum constraint. Moreover, since the MH estimator can capture the features of the risk model more directly than the existing estimators, it can maintain high performance even when the underlying loss distribution is multimodal or heavy-tailed. By the general theory of Markov chains, the MH estimator is consistent and asymptotically normally distributed. Through simulation and empirical studies based on real-world data, the performance of the MH estimator was compared with those of other existing estimators for various risk models. The numerical results demonstrated that in most risk models, the MH estimator had smaller bias and RMSE compared with other existing estimators even when the dimension of the portfolio was high, such as $d\approx 500$ .

Potential future research includes a theoretical study of the conditional joint distribution of $\bm{X}|S=v$ . Our main interest is in the influence of the underlying copula of a risk model on the tail behavior and multimodality of the density $f_{\bm{X}|S=v}$ . We believe that revealing relationships among them can provide more promising guidelines for the proposal selection of the MH estimator.

Acknowledgements

We wish to thank to Paul Embrechts from ETH Zürich for his valuable comments regarding the simulation setup. We would also like to express our gratitude to Kengo Kamatani from Osaka University, and Marius Hofert from the University of Waterloo for fruitful discussions on MCMC and Archimedean copulas.

Funding

This work was supported by the Japan Society for the Promotion of Science (JSPS) under the Core-to-Core program at Keio University.

7 Consistency and asymptotic normality

In this appendix, we derive conditions on a copula and marginal distributions with which the corresponding MH estimator of VaR contributions satisfies consistency (16) and CLT $(\ref{central limit theorem})$ for some choice of proposal distribution $q$ . This study reveals which proposal distribution is appropriate for a given risk model.

We classify the loss distribution $F_{\bm{X}}$ into two cases; one wherein $\text{supp}(f_{\bm{X}})=\mathbb{R}^{d}_{+}=\{\bm{x}\in\mathbb{R}^{d}:\bm{x}\geq\bm{0}\}$ and another wherein $\text{supp}(f_{\bm{X}})=\mathbb{R}^{d}$ . The former corresponds to the case wherein we model pure losses, and the latter to the case of profits and losses (P&L). Our result is mainly about the former case, and we provide some limited examples for the latter case. It should be emphasized that the former case of pure losses includes a broad range of loss models. To demonstrate this, let $c_{j}=\text{ess.inf}(X_{j})$ , and set $\tilde{X}_{j}=X_{j}-c_{j}$ , $j=1,\dots,d$ . If $-\infty<c_{j}$ , then $\tilde{X}_{j}\geq 0$ . For $\tilde{S}=\sum_{j=1}^{d}\tilde{X}_{j}$ , the translation invariance of VaRp implies that

[TABLE]

Therefore, the allocated capital of $\tilde{X}_{j}$ is given by

[TABLE]

Consequently, one can estimate $(\text{AC}_{1},\text{AC}_{2},\dots,\text{AC}_{d})$ by first estimating $(\tilde{\text{AC}}_{1},\tilde{\text{AC}}_{2},\dots,\tilde{\text{AC}}_{d})$ based on the joint distribution of $(\tilde{X}_{1},\tilde{X}_{2},\dots,\tilde{X}_{d})$ such that $\text{supp}(f_{\tilde{\bm}{X}})=\mathbb{R}^{d}_{+}$ , and then subtracting $(c_{1},c_{2},\dots,c_{d})$ from $\tilde{\text{AC}}$ . Therefore, our result for the former case includes the case of P&L where the minimums of the profits are bounded.

7.1 Case of pure losses

When $\text{supp}(f_{\bm{X}})=\mathbb{R}^{d}_{+}$ , the conditional distribution $F_{\bm{X}^{\prime}|S=v}$ is supported on the following bounded set called the $v$ -simplex:

[TABLE]

Thanks to the compactness of the support, we can state simple conditions on the marginal loss densities and copula density, which leads to consistency and CLT of the MH estimator.

Theorem 7.1.

Suppose that the joint distribution $f_{\bm{X}}$ is supported on $\mathbb{R}^{d}_{+}$ and has marginal densities $f_{1},f_{2},\dots,f_{d}$ and a copula density $c$ . Then, $\sqrt{N}$ -CLT holds for the MH estimator $(\ref{MCMC estimator})$ of VaR contributions if the following conditions $(C1)-(C3)$ hold:

(C1)

$\epsilon:=\inf_{\bm{x},\bm{y}\in\mathcal{S}_{v}}q(\bm{x},\bm{y})>0$ ,

(C2)

$f_{j}(x)$ * is positive and bounded above for any $x\in[0,v]$ for $j=1,2,\dots,d$ , and*

(C3)

$c(\bm{u})$ * is positive and bounded above for any $\bm{u}\in F_{1}([0,v])\times\cdots\times F_{d}([0,v])$ .*

Proof.

According to Theorem 23 in Roberts and Rosenthal (2004), $\sqrt{N}$ -CLT holds if the Markov chain is uniformly ergodic whenever $\mathbb{E}[||\bm{X}^{\prime}||^{2}|S=v]<\infty$ . Since $X_{1},X_{2},\dots,X_{d}\geq 0$ , the moment the condition is satisfied by the inequality

[TABLE]

for any $i,j\in\{1,2,\dots,d\}$ . Thus, it suffices to show that the Markov chain is uniformly ergodic. According to Theorem 1.3 in Mengersen and Tweedie (1996), the Markov chain is uniformly ergodic if (and only if) the minorization condition (Rosenthal, 1995) holds on the whole space ${\mathcal{S}}_{v}$ ; that is, there exists a positive integer $n$ , a positive number $\delta>0$ , and a probability measure $\nu$ such that

[TABLE]

for any $\bm{x}\in{\mathcal{S}}_{v}$ and $A\in{\mathcal{B}}_{v}$ , where ${\mathcal{B}}_{v}:={\mathcal{B}}(\mathbb{R}^{d^{\prime}})\cap{\mathcal{S}}_{v}$ . Our target distribution can be written as

[TABLE]

where $(x_{1},x_{2},\dots,x_{d-1})\in\mathcal{S}_{v}$ and $x_{d}=v-\bm{1}^{\text{\scriptsize T}}_{d}\bm{x}$ . Thus, by conditions $(C2)$ , $(C3)$ , and that ${\mathcal{S}}_{v}\subset[0,v]^{d^{\prime}}$ , we have

[TABLE]

Using $(\ref{bounded condition on pi})$ and condition $(C1)$ , the minorization condition can be checked as follows. For any $\bm{x}\in{\mathcal{S}}_{v}$ , define

[TABLE]

Then, for any $A\in{\mathcal{B}}_{v}$ , we have

[TABLE]

Therefore, the minorization condition holds for $n=1$ , $\delta=\frac{\epsilon}{u}>0$ , and $\nu=\pi$ . Consequently, the Markov chain is uniformly ergodic, and thus $\sqrt{N}$ -CLT holds. Since the minorization condition (27) holds, consistency of $\hat{\bm\pi}_{N}(\bm h)$ follows by Theorem 1 in Nummelin (2002). ∎

An example of the pair of risk model and proposal distribution is given in the following example.

Example 1.

For $j=1,\dots,d$ , let $X_{j}$ follow Pareto distribution with density given by

[TABLE]

Suppose $\bm{X}=(X_{1},X_{2},\dots,X_{d})$ has a survival Clayton copula with the density given by

[TABLE]

Some simple calculations show that the marginal distribution (29) satisfies $(C2)$ and the copula (30) satisfies $(C3)$ under a very mild sufficient condition that $0<\theta<\log(1-p)/\log(1-\frac{1}{d})$ . Therefore, with any choice of proposal distribution $q$ satisfying $(C1)$ , the corresponding MH estimator (23) satisfies consistency and asymptotic normality. A possible choice of $q$ is the random walk proposal $q(\bm{x},\bm{y})=f(\bm{y}-\bm{x})$ with $f$ the density of multivariate normal distribution with mean zero. Since $\bm{y}-\bm{x}\in[-v,v]^{d^{\prime}}$ for $\bm{x},\bm{y}\in\mathcal{S}_{v}$ , $f(\bm{y}-\bm{x})$ is always positive.

It is worth noting that the condition $(C3)$ is irrelevant to the copula on the upper tail part $[F_{1}(v),1]\times\cdots\times[F_{d}(v),1]$ . Therefore, $(C3)$ holds even if a copula density explodes at the upper corner, which is often the case with copulas having upper tail dependence. In fact, a more general result holds for survival Archimedean copulas. A $d$ -dimensional Archimedean copula with an Archimedean generator $\psi$ is given by

[TABLE]

where $\psi$ is a continuous and nonincreasing function $\psi:[0,\infty]\rightarrow[0,1]$ satisfying $\psi(0)=1$ , and $\lim_{t\rightarrow\infty}\psi(t)=0$ , and is decreasing on $[0,\inf\{t:\psi(t)=0\}]$ . The inverse $\psi^{-1}(u)$ is well-defined on $u\in(0,1]$ and $\psi^{-1}(0)$ is defined by $\psi^{-1}(0)=\inf\{t:\psi(t)=0\}$ . Let $\psi^{(j)}$ be the $j$ th derivative of $\psi$ . An Archimedean generator $\psi$ defines a proper $d$ -copula via (31) for any $d\geq 1$ if and only if $\psi$ is completely monotone, that is, $(-1)^{j}\psi^{(j)}\geq 0$ on $(0,\infty)$ for all $j=0,1,\dots$ ; see McNeil et al. (2009). We denote the class of completely monotone generators as $\Psi_{\infty}$ . According to Bernstein’s Theorem (see, for example, Feller, 2008), $\psi\in\Psi_{\infty}$ admits the Laplace-Stieltjes representation $\psi(t)=\mathbb{E}_{F}[{\mathrm{e}}^{-tV}]$ for some positive random variable $V>0$ .

Theorem 7.2 (Sufficient condition of $(C3)$ for survival Archimedean copulas).

Let $\psi\in\Psi_{\infty}$ be a completely monotone Archimedean generator. If $\mathbb{E}[V^{d}]<\infty$ where $V$ is such that $\psi(t)=\mathbb{E}[{\mathrm{e}}^{-tV}]$ , then the survival Archimedean copula $\bar{C}_{\psi}$ has a density satisfying the condition $(C3)$ in Theorem 7.1; moreover, $\bar{C}_{\psi}$ has a zero lower tail dependence coefficient.

Proof.

Denote $\bar{u}_{j}=F_{j}(v)<1$ and $\underline{$ u $}_{j}=1-\bar{u}_{j}>0$ . The density of the survival Archimedean copula is given by

[TABLE]

where $t_{j}=\psi^{-1}(1-u_{j})$ and $t=\sum_{j=1}^{d}t_{j}$ . When $u_{j}\in[0,F_{j}(v)]$ , we have $0<\underline{$ u $}_{j}\leq 1-u_{j}\leq 1$ and thus $t_{j}=\psi^{-1}(1-u_{j})\in[0,\bar{t}_{j}]$ where $\bar{t}_{j}=\psi^{-1}(\underline{$ u $}_{j})<\infty$ . Thus, $0\leq t=\sum_{j=1}^{d}l_{j}<\infty$ .

Since $\psi\in\Psi_{\infty}$ , it is of the form $\psi(t)=\mathbb{E}[{\mathrm{e}}^{-tV}]$ for some positive random variable $V>0$ . Therefore, on $0\leq t<\infty$ , we have $0<(-1)^{j}\psi^{(j)}(t)<\infty$ for $j=1$ and $j=d$ since $(-1)^{j}\psi^{(j)}(t)=\mathbb{E}[V^{j}{\mathrm{e}}^{-tV}]>0$ and $\mathbb{E}[V^{j}{\mathrm{e}}^{-tV}]\leq\mathbb{E}[V^{j}]<\infty$ for $j=1$ and $j=d$ by assumption. Consequently, the density (7.1) is bounded from below and above.

When $\mathbb{E}[V^{d}]<\infty$ , the corresponding Archimedean copula has an upper tail dependence coefficient

[TABLE]

where the last equality comes from l’H $\hat{\text{o}}$ pital’s rule. Since

[TABLE]

since $\mathbb{E}[V\mathrm{e}^{-2tV}]$ and $\mathbb{E}[V\mathrm{e}^{-tV}]$ go to $\mathbb{E}[V]<\infty$ as $t\rightarrow 0$ . Thus, for the survival Archimedean copula, $\lambda_{l}(\bar{C}_{\psi})=\lambda_{u}(C_{\psi})=0$ . ∎

According to the Theorem 7.2, the survival Clayton copula satisfies $(C3)$ while the survival Gumbel copula does not because it has a positive lower tail dependence coefficient; see Hofert et al. (2012).

*Remark 2** (Consistency and CLT for copulas with lower tail dependence).*

Condition $(C3)$ does not hold for elliptical copulas with lower tail dependence, such as a student’s $t$ -copula with density

[TABLE]

By carefully checking the proof of Theorem 7.1, the consistency and asymptotic normality of $\hat{{\bm\pi}}_{N}(\bm h)$ still hold under a weaker condition than $(C2)$ and $(C3)$ ;

[TABLE]

for some positive constant $L>0$ , where $\tilde{\mathcal{S}}_{v}=\{\bm{x}\in\mathbb{R}^{d}:\bm{1}^{\text{\scriptsize T}}_{d}\bm{x}=1\}$ . While it is not straightforward to determine, one sufficient condition of (34) under $(C1)$ is that $\pi$ is bounded above on $\tilde{\mathcal{S}_{v}}$ . Another condition is that the proposal density $q$ explodes faster than $\pi$ . An example of such $q$ can be an independent proposal distribution $q(\bm{x},\bm{y})=f(\bm{y})$ with $f$ the density of the Dirichlet distribution $\mathcal{D}(\alpha_{1},\alpha_{2},\dots,\alpha_{d})$ for $\alpha_{1},\alpha_{2},\dots,\alpha_{d}<1$ , which explodes to $\infty$ as $\bm{x}$ approaches to an axis. Therefore, by choosing such proposal distributions, consistency and CLT can still hold even if a copula density explodes at the lower corner $\bm{u}=\bm{0}$ .

7.2 Case of profits and losses

In contrast to the case of pure losses, showing the consistency and CLT of the MH estimator is challenging for the case wherein we model P&L. Since the conditional density $f_{\bm{X}^{\prime}|S=v}$ is supported on the unbounded space $\mathbb{R}^{d^{\prime}}$ , careful study of its tail behaviors is necessary. When the original loss random variable $\bm{X}$ follows an elliptical distribution, the results of Kamatani (2017) can be applicable to justify the CLT of our MH estimator with the MpCN proposal distribution. An example of justification of CLT for the case wherein $\bm{X}$ follows the multivariate student’s $t$ -distribution is provided below.

Example 2 (Justification of CLT for multivariate student’s $t$ -Distribution).

We demonstrate that the MpCN proposal distribution $(\ref{MpCN proposal})$ achieves the CLT of VaR contributions when the underlying loss model is a multivariate student’s $t$ -distribution $t_{\nu}(\bm{\mu},{\bf\Sigma})$ with density

[TABLE]

Let $\bm{X}\sim t_{\nu}(\bm{\mu},{\bf\Sigma})$ where $\nu>2$ , $\bm{\mu}\in\mathbb{R}^{d}$ , and ${\bf\Sigma}\in{\mathcal{M}}^{d\times d}_{+}$ . Throughout the discussion, we set $\bm{\mu}=\bm{0}$ for simplicity. Write

[TABLE]

for ${\bf A}_{1}\in{\mathcal{M}}^{d^{\prime}\times d^{\prime}}(\mathbb{R})$ , ${\bm a}_{2}\in\mathbb{R}^{d^{\prime}}$ , and $a_{3}\in\mathbb{R}$ . Then, it holds that

[TABLE]

where ${\bf V}:={\bf A}_{1}-{\bm a}_{2}\bm{1}_{d^{\prime}}^{\text{\scriptsize T}}-\bm{1}_{d^{\prime}}{\bm a}_{2}^{\text{\scriptsize T}}+\bm{1}_{d^{\prime}}\bm{1}_{d^{\prime}}^{\text{\scriptsize T}}\in{\mathcal{M}}^{d^{\prime}\times d^{\prime}}_{+}$ , $\bm{w}:={\bf V}^{-1}(vA_{3}\bm{1}_{d^{\prime}}-v{\bm a}_{2})\in\mathbb{R}^{d^{\prime}}$ , and $\eta:=v^{2}a_{3}-\bm{w}^{\text{\scriptsize T}}{\bf V}\bm{w}\in\mathbb{R}$ . Using this identity, we have that

[TABLE]

where $\bf W=\bf V^{-1}$ . Provided $\nu+\eta>0$ , $\bm{X}^{\prime}|S=v$ follows a $d^{\prime}$ -dimensional elliptical distribution with the location parameter $\bm{w}$ , scale parameter ${\bf W}$ , and the density generator $g:\mathbb{R}_{+}\rightarrow\mathbb{R}_{+}$ given by

[TABLE]

This type of distribution is called a Pearson type $VII$ distribution (Schmidt, 2002).

Consider the MH estimator $(\ref{MCMC estimator})$ where target distribution $\pi$ is $f_{\bm{X}^{\prime}|S=v}$ , and proposal distribution $q$ is MpCN $(\ref{MpCN proposal})$ . According to Theorem 25 in Roberts and Rosenthal (2004), $\sqrt{N}$ -CLT holds if the Markov chain is geometrically ergodic and $\mathbb{E}[||\bm{X}^{\prime}||^{2}|S=v]<\infty$ . According to Proposition 3.4 in Kamatani (2016), the Markov chain with the MpCN proposal distribution is geometrically ergodic if $\mathbb{E}[||\bm{X}^{\prime}||^{\delta}|S=v]<\infty$ for some $\delta>0$ , $\pi(\bm{x})$ is strictly positive and continuous, and it is symmetrically regularly varying, that is,

[TABLE]

for some function $\lambda:\mathbb{R}^{d^{\prime}}\rightarrow(0,\infty)$ such that $\lambda(\bm{x})=1$ for any $\bm{x}\in S^{d^{\prime}-1}_{{\bf W}}$ , where $S^{d^{\prime}-1}_{{\bf W}}:=\{\bm{x}\in\mathbb{R}^{d^{\prime}}:||{\bf W}^{-\frac{1}{2}}\bm{x}||=||{\bf W}^{-\frac{1}{2}}\bm{1}_{d^{\prime}}||\}$ . We will see that the moment condition holds, and the condition on tail $(\ref{symmetrically regularly varying})$ is also satisfied for $\pi=f_{\bm{X}^{\prime}|S=v}$ .

Write $R:=||\bm{X}^{\prime}||$ . It can be shown that $g$ is regularly varying (see, for example, Resnick, 2013) at $\infty$ with index $\alpha=-\frac{\nu+d}{2}$ ; that is,

[TABLE]

According to Proposition 3.7 in Schmidt (2002), $f_{R|S=v}$ is regularly varying with index $-(\nu+1)$ . Then, according to Karamata’s Theorem (we referred to Resnick, 2013), $F_{R|S=v}$ is regularly varying with index $-\nu$ . Therefore, $\mathbb{E}[R^{\delta}|S=v]<\infty$ holds for any $\delta<\nu$ ; see Mikosch (1999). Thus, all the moment conditions above are satisfied as long as $\nu>2$ . In the elliptical case, tail condition $(\ref{symmetrically regularly varying})$ is a direct consequence of $(\ref{density generator regularly varying})$ . Since $(\bm{x}-\bm{w})^{\text{\scriptsize T}}{\bf W}^{-1}(\bm{x}-\bm{w})>0$ for all $\bm{x}\in\mathbb{R}^{d^{\prime}}$ , it holds that

[TABLE]

Thus, by taking

[TABLE]

in $(\ref{symmetrically regularly varying})$ , $\pi=f_{\bm{X}^{\prime}|S=v}$ is shown to be symmetrically regularly varying. Putting them together, we conclude that the MH estimator with the MpCN proposal distribution satisfies $\sqrt{N}$ -CLT when the underlying loss vector follows a multivariate student’s $t$ -distribution with $\nu>2$ and $\eta>-\nu$ . Note that in the numerical experiment in section 5, we set $d=3$ and $\nu=4$ . Since $\eta+\nu=137.935>0$ , CLT holds true.

Bibliography42

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Demarta and Mc Neil (2005) Demarta, S. and Mc Neil, A.J., The t copula and related copulas. International statistical review , 2005, 73 , 111–129.
2Denault (2001) Denault, M., Coherent allocation of risk capital. Journal of Risk , 2001, 4 , 1–34.
3Dev (2004) Dev, A., Economic capital: a practitioner guide , 2004 (Risk Books: New York).
4Fan et al. (2012) Fan, G., Zeng, Y. and Wong, W.K., Decomposition of portfolio Va R and expected shortfall based on multivariate Copula simulation. International Journal of Management Science and Engineering Management , 2012, 7 , 153–160.
5Feller (2008) Feller, W., An introduction to probability theory and its applications , Vol. 2, , 2008, John Wiley & Sons.
6Fernández and Steel (1998) Fernández, C. and Steel, M.F., On Bayesian modeling of fat tails and skewness. Journal of the American Statistical Association , 1998, 93 , 359–371.
7Geyer (2011) Geyer, C., Introduction to markov chain monte carlo. In Handbook of Markov Chain Monte Carlo , pp. 3–47, 2011 (Springer: New York).
8Glasserman (2005) Glasserman, P., Measuring marginal risk contributions in credit portfolios. Journal of Computational Finance , 2005, 9 , 1–41.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Estimation of risk contributions with MCMC

Abstract

keywords:

1 Introduction

2 Capital allocation problem

3 MCMC estimators

3.1 A brief introduction to MCMC

3.2 Choice of the proposal distribution

4 The proposed method

4.1 Assumptions and setup

Assumption 1**.**

4.2 The MH estimator of VaR contributions

Remark 1* (MCMC methods for ES contributions).*

5 Numerical experiments

5.1 Simulation study

5.1.1 Description of the numerical comparison

5.1.2 Results and discussions

5.2 Empirical study

5.3 Advantages and disadvantages of the MH estimator

5.4 Guidelines for the choice of proposal distribution

6 Concluding remarks

Acknowledgements

Funding

7 Consistency and asymptotic normality

7.1 Case of pure losses

Theorem 7.1**.**

Proof.

Example 1**.**

Theorem 7.2** (Sufficient condition of (C3)(C3)(C3) for survival Archimedean copulas).**

Proof.

Remark 2* (Consistency and CLT for copulas with lower tail dependence).*

7.2 Case of profits and losses

Example 2** (Justification of CLT for multivariate student’s ttt-Distribution).**

Assumption 1.

*Remark 1** (MCMC methods for ES contributions).*

Theorem 7.1.

Example 1.

Theorem 7.2 (Sufficient condition of $(C3)$ for survival Archimedean copulas).

*Remark 2** (Consistency and CLT for copulas with lower tail dependence).*

Example 2 (Justification of CLT for multivariate student’s $t$ -Distribution).