Large Deviations of Factor Models with Regularly-Varying Tails:   Asymptotics and Efficient Estimation

Farzad Pourbabaee; Omid Shams Solari

arXiv:1903.12299·math.ST·December 10, 2019

Large Deviations of Factor Models with Regularly-Varying Tails: Asymptotics and Efficient Estimation

Farzad Pourbabaee, Omid Shams Solari

PDF

TL;DR

This paper studies the probability of large deviations in factor models with heavy-tailed distributions, introduces an efficient estimation method, and validates it through simulations, improving over traditional Monte Carlo techniques.

Contribution

It provides a new asymptotic analysis for large deviations in heavy-tailed factor models and develops an efficient estimation method outperforming crude Monte Carlo.

Findings

01

The proposed estimator significantly reduces variance compared to Monte Carlo.

02

Empirical validation confirms the theoretical efficiency gains.

03

The method is implemented in the Betta software package.

Abstract

We analyze the \textit{Large Deviation Probability (LDP)} of linear factor models generated from non-identically distributed components with \textit{regularly-varying} tails, a large subclass of heavy tailed distributions. An efficient sampling method for LDP estimation of this class is introduced and theoretically shown to exponentially outperform the crude Monte-Carlo estimator, in terms of the coverage probability and the confidence interval's length. The theoretical results are empirically validated through stochastic simulations on independent non-identically Pareto distributed factors. The proposed estimator is available as part of a more comprehensive \texttt{Betta} package.

Tables3

Table 1. Table 1: x , μ 𝑥 𝜇 x,\mu , and r 𝑟 r corresponding to simulation 5.1

$x$	$μ$	$r$
100	1.921e-02	8.7e-04
200	8.22e-03	1.45e-03
300	5.14e-03	1.99e-03
400	3.71e-03	2.30e-03
500	2.89e-03	3.03e-03
600	2.36e-03	3.64e-03
700	1.99e-03	4.25e-03
800	1.72e-03	5.15e-03
900	1.51e-03	5.74e-03
1000	1.35e-03	5.69e-03

Table 2. Table 2: α ¯ , μ ¯ 𝛼 𝜇 \bar{\alpha},\mu , and r 𝑟 r while α m i n subscript 𝛼 𝑚 𝑖 𝑛 \alpha_{min} is constant. LR increases by orders of magnitude while μ 𝜇 \mu does not change significantly. This is an empirical validation of the Catastrophe principle, since it demonstrates that the LDP is driven by the heaviest tail and few smaller perturbations do not add up to a significant change in LDP.

$\bar{α}$	$μ$	$r$
1.45	3.367e-02	4.7e-04
1.85	1.389e-02	1.08e-03
2.25	1.109e-02	3.22e-03
2.65	1.058e-02	9.82e-03
3.05	1.042e-02	1.894e-02
3.45	1.034e-02	4.432e-02
3.85	1.029e-02	5.362e-02
4.25	1.025e-02	2.6906e-01
4.65	1.022e-02	3.0186e-01
5.05	1.020e-02	4.1787e-01
5.45	1.018e-02	4.4620e-01

Table 3. Table 3: α m i n , μ subscript 𝛼 𝑚 𝑖 𝑛 𝜇 \alpha_{min},\mu , and r 𝑟 r , with α ¯ ¯ 𝛼 \bar{\alpha} kept equal between each row of this table and Table 2 . Note the extreme change in μ 𝜇 \mu while r 𝑟 r does not change significantly.

$α_{m i n}$	$μ$	$r$
1.0	3.36e-02	4.6e-04
1.4	5.14e-03	5.87e-04
1.8	7.90e-04	8.75e-04
2.2	1.22e-04	1.21e-03
2.6	1.92e-05	1.51e-03
3.0	3.02e-06	1.84e-03
3.4	4.77e-07	2.53e-03
3.8	7.54e-08	3.12e-03
4.2	1.19e-08	3.70e-03
4.6	1.88e-09	6.06e-03
5.0	2.98e-10	6.26e-03

Equations126

η_{i} = ⟨ β_{i}, ϕ ⟩ + ε_{i}

η_{i} = ⟨ β_{i}, ϕ ⟩ + ε_{i}

ξ = \frac{1}{M} i = 1 \sum M η_{i} = \overset{η}{ˉ} = ⟨ \overset{ˉ}{β}, ϕ ⟩ + \overset{ε}{ˉ},

ξ = \frac{1}{M} i = 1 \sum M η_{i} = \overset{η}{ˉ} = ⟨ \overset{ˉ}{β}, ϕ ⟩ + \overset{ε}{ˉ},

ψ (θ) = lo g E [e^{θ ξ}] = \frac{θ ^{2}}{2} (\overset{ˉ}{β}_{2}^{2} + \frac{1}{M ^{2}} i = 1 \sum M σ_{i}^{2}) .

ψ (θ) = lo g E [e^{θ ξ}] = \frac{θ ^{2}}{2} (\overset{ˉ}{β}_{2}^{2} + \frac{1}{M ^{2}} i = 1 \sum M σ_{i}^{2}) .

\frac{d P _{θ}}{d P} = e^{θ ξ - ψ (θ)} .

\frac{d P _{θ}}{d P} = e^{θ ξ - ψ (θ)} .

\frac{1}{n} i = 1 \sum n 1_{[ξ_{i} > λ]} \frac{d P}{d P _{θ}} (ξ_{i})

\frac{1}{n} i = 1 \sum n 1_{[ξ_{i} > λ]} \frac{d P}{d P _{θ}} (ξ_{i})

λ \to \infty lim sup \frac{Var ( Z ( λ ))}{E [ Z ( λ ) ] ^{2}} < \infty,

λ \to \infty lim sup \frac{Var ( Z ( λ ))}{E [ Z ( λ ) ] ^{2}} < \infty,

λ \to \infty lim sup \frac{Var ( Z ( λ ))}{E [ Z ( λ ) ] ^{2 - ε}} = 0.

λ \to \infty lim sup \frac{Var ( Z ( λ ))}{E [ Z ( λ ) ] ^{2 - ε}} = 0.

θ^{*} = \frac{λ}{β ˉ _{2}^{2} + \sum _{i = 1}^{M} σ _{i}^{2} / M ^{2}} .

θ^{*} = \frac{λ}{β ˉ _{2}^{2} + \sum _{i = 1}^{M} σ _{i}^{2} / M ^{2}} .

x \to \infty lim \frac{P [ X _{1} + \dots + X _{N} > x ]}{P [ X _{1} > x ]} = N for all N \geq 1,

x \to \infty lim \frac{P [ X _{1} + \dots + X _{N} > x ]}{P [ X _{1} > x ]} = N for all N \geq 1,

x \to \infty lim \frac{F ˉ ( x + h ( x ))}{F ˉ ( x )} = 1,

x \to \infty lim \frac{F ˉ ( x + h ( x ))}{F ˉ ( x )} = 1,

x \to \infty lim \frac{L ( t x )}{L ( x )} = 1 for all t > 0.

x \to \infty lim \frac{L ( t x )}{L ( x )} = 1 for all t > 0.

L (x) = a (x) exp (\int_{1}^{x} \frac{ε ( y )}{y} d y),

L (x) = a (x) exp (\int_{1}^{x} \frac{ε ( y )}{y} d y),

x \to \infty lim \frac{L ( x + x ^{δ} )}{L ( x )} = x \to \infty lim \frac{a ( x + x ^{δ} )}{a ( x )} x \to \infty lim exp (\int_{x}^{x + x^{δ}} \frac{ε ( y )}{y} d y),

x \to \infty lim \frac{L ( x + x ^{δ} )}{L ( x )} = x \to \infty lim \frac{a ( x + x ^{δ} )}{a ( x )} x \to \infty lim exp (\int_{x}^{x + x^{δ}} \frac{ε ( y )}{y} d y),

\int_{x}^{x + x^{δ}} \frac{ε ( y )}{y} d y \leq \frac{sup _{y \in (x, x + x^{δ})} ∣ ε ( y ) ∣}{x ^{1 - δ}} \to 0, as x \to \infty.

\int_{x}^{x + x^{δ}} \frac{ε ( y )}{y} d y \leq \frac{sup _{y \in (x, x + x^{δ})} ∣ ε ( y ) ∣}{x ^{1 - δ}} \to 0, as x \to \infty.

P [X_{1} + \dots + X_{N} > x] \sim i = 1 \sum N P [X_{i} > x] \sim (i = 1 \sum N c_{i}) \overset{ˉ}{F} (x)

P [X_{1} + \dots + X_{N} > x] \sim i = 1 \sum N P [X_{i} > x] \sim (i = 1 \sum N c_{i}) \overset{ˉ}{F} (x)

P [1 \leq i \leq N max X_{i} > x] \sim P [X_{1} + \dots + X_{N} > x], as x \to \infty,

P [1 \leq i \leq N max X_{i} > x] \sim P [X_{1} + \dots + X_{N} > x], as x \to \infty,

i = 1 \sum N P [X_{i} > x] + o (\overset{ˉ}{F} (x)) \leq P [1 \leq i \leq N max X_{i} > x] \leq (1 - e^{- 1})^{- 1} i = 1 \sum N P [X_{i} > x] + o (\overset{ˉ}{F} (x))

i = 1 \sum N P [X_{i} > x] + o (\overset{ˉ}{F} (x)) \leq P [1 \leq i \leq N max X_{i} > x] \leq (1 - e^{- 1})^{- 1} i = 1 \sum N P [X_{i} > x] + o (\overset{ˉ}{F} (x))

x \to \infty lim sup \frac{Var ( Z ( x ))}{E [ Z ( x ) ] ^{2}} < \infty.

x \to \infty lim sup \frac{Var ( Z ( x ))}{E [ Z ( x ) ] ^{2}} < \infty.

Z (x) = i = 1 \sum N P [S_{N} > x, M_{N} = X_{i} ∣ X_{- i}] = i = 1 \sum N P [X_{i} > (x - S_{N, - i}) \lor M_{N, - i} ∣ X_{- i}] = i = 1 \sum N \overset{ˉ}{F}_{i} ((x - S_{N, - i}) \lor M_{N, - i}) \sim i = 1 \sum N c_{i} \overset{ˉ}{F} ((x - S_{N, - i}) \lor M_{N, - i})

Z (x) = i = 1 \sum N P [S_{N} > x, M_{N} = X_{i} ∣ X_{- i}] = i = 1 \sum N P [X_{i} > (x - S_{N, - i}) \lor M_{N, - i} ∣ X_{- i}] = i = 1 \sum N \overset{ˉ}{F}_{i} ((x - S_{N, - i}) \lor M_{N, - i}) \sim i = 1 \sum N c_{i} \overset{ˉ}{F} ((x - S_{N, - i}) \lor M_{N, - i})

x \to \infty lim sup \frac{E [ Z ( x ) ^{2} ]}{E [ Z ( x ) ] ^{2}} \leq x \to \infty lim \frac{( \sum _{i = 1}^{N} c _{i} ) ^{2} F ˉ ( x / N ) ^{2}}{( \sum _{i = 1}^{N} c _{i} ) ^{2} F ˉ ( x ) ^{2}} = \frac{L ^{2} ( x / N ) / ( x / N ) ^{2 α}}{L ( x ) ^{2} / x ^{2 α}} = N^{2 α} .

x \to \infty lim sup \frac{E [ Z ( x ) ^{2} ]}{E [ Z ( x ) ] ^{2}} \leq x \to \infty lim \frac{( \sum _{i = 1}^{N} c _{i} ) ^{2} F ˉ ( x / N ) ^{2}}{( \sum _{i = 1}^{N} c _{i} ) ^{2} F ˉ ( x ) ^{2}} = \frac{L ^{2} ( x / N ) / ( x / N ) ^{2 α}}{L ( x ) ^{2} / x ^{2 α}} = N^{2 α} .

\frac{Z ˉ _{n} ( x ) - μ ( x )}{σ ( x ) / n} ⟹ d Z = d N (0, 1)

\frac{Z ˉ _{n} ( x ) - μ ( x )}{σ ( x ) / n} ⟹ d Z = d N (0, 1)

P [\overset{ˉ}{Z}_{n} (x) - μ (x) \leq κ μ (x)] = P [∣ Z ∣ \leq \frac{κ μ ( x )}{σ ( x ) / n}] + o_{n} (1) \geq P [∣ Z ∣ \leq \frac{κ n}{N ^{α}}] + o_{x} (1) + o_{n} (1) = (2Φ (\frac{κ n}{N ^{α}}) - 1) + o_{x} (1) + o_{n} (1),

P [\overset{ˉ}{Z}_{n} (x) - μ (x) \leq κ μ (x)] = P [∣ Z ∣ \leq \frac{κ μ ( x )}{σ ( x ) / n}] + o_{n} (1) \geq P [∣ Z ∣ \leq \frac{κ n}{N ^{α}}] + o_{x} (1) + o_{n} (1) = (2Φ (\frac{κ n}{N ^{α}}) - 1) + o_{x} (1) + o_{n} (1),

P [\overset{ˉ}{Z}_{n} (x) - μ (x) > κ μ (x)] \leq \frac{E [ ( Z ˉ _{n} ( x ) - μ ( x ) ) ^{2} ]}{κ ^{2} μ ( x ) ^{2}} \leq \frac{N ^{2 α}}{κ ^{2} n} + o_{x} (1),

P [\overset{ˉ}{Z}_{n} (x) - μ (x) > κ μ (x)] \leq \frac{E [ ( Z ˉ _{n} ( x ) - μ ( x ) ) ^{2} ]}{κ ^{2} μ ( x ) ^{2}} \leq \frac{N ^{2 α}}{κ ^{2} n} + o_{x} (1),

E [e^{λ (X - μ)}] \leq e^{\frac{λ ^{2} σ ^{2}}{2}}, for all λ \in R .

E [e^{λ (X - μ)}] \leq e^{\frac{λ ^{2} σ ^{2}}{2}}, for all λ \in R .

P [∣ X - μ ∣ > t] \leq 2 e^{- t^{2} /2 σ^{2}}

P [∣ X - μ ∣ > t] \leq 2 e^{- t^{2} /2 σ^{2}}

P [∣ Z (x) - μ (x) ∣ > κ μ (x)] \leq 2 exp ⎩ ⎨ ⎧ \frac{- 2 κ ^{2} μ ( x ) ^{2}}{( \sum _{i = 1}^{N} F ˉ _{i} ( x / N ) ) ^{2}} ⎭ ⎬ ⎫ = e^{- 2 κ^{2} / N^{2 α}} + o_{x} (1)

P [∣ Z (x) - μ (x) ∣ > κ μ (x)] \leq 2 exp ⎩ ⎨ ⎧ \frac{- 2 κ ^{2} μ ( x ) ^{2}}{( \sum _{i = 1}^{N} F ˉ _{i} ( x / N ) ) ^{2}} ⎭ ⎬ ⎫ = e^{- 2 κ^{2} / N^{2 α}} + o_{x} (1)

P [\overset{ˉ}{Z}_{n} (x) - μ (x) > κ μ (x)] \leq 2 exp ⎩ ⎨ ⎧ \frac{- 2 n κ ^{2} μ ( x ) ^{2}}{( \sum _{i = 1}^{N} F ˉ _{i} ( x / N ) ) ^{2}} ⎭ ⎬ ⎫ = e^{- 2 n κ^{2} / N^{2 α}} + o_{x} (1)

P [\overset{ˉ}{Z}_{n} (x) - μ (x) > κ μ (x)] \leq 2 exp ⎩ ⎨ ⎧ \frac{- 2 n κ ^{2} μ ( x ) ^{2}}{( \sum _{i = 1}^{N} F ˉ _{i} ( x / N ) ) ^{2}} ⎭ ⎬ ⎫ = e^{- 2 n κ^{2} / N^{2 α}} + o_{x} (1)

\overset{μ}{^}_{n} (x) := \frac{1}{n} k = 1 \sum n 1_{[X_{1}^{(k)} + \dots + X_{N}^{(k)} > x]},

\overset{μ}{^}_{n} (x) := \frac{1}{n} k = 1 \sum n 1_{[X_{1}^{(k)} + \dots + X_{N}^{(k)} > x]},

x \to \infty lim sup n \to \infty lim {r n + lo g (\frac{P [ Z ˉ _{n} ( x ) - μ ( x ) > κ μ ( x ) ]}{P [ ∣ μ ^ _{n} ( x ) - μ ( x ) ∣ > κ μ ( x ) ]})} \leq 0.

x \to \infty lim sup n \to \infty lim {r n + lo g (\frac{P [ Z ˉ _{n} ( x ) - μ ( x ) > κ μ ( x ) ]}{P [ ∣ μ ^ _{n} ( x ) - μ ( x ) ∣ > κ μ ( x ) ]})} \leq 0.

E_{(i)} [\frac{f _{i} ( X _{i} )}{f ~ _{i} ( X _{i} )} 1_{[S_{N}^{(i)} > x, M_{N}^{(i)} = \tilde{X}_{i}]}] = \int \frac{f _{i} ( x ~ _{i} )}{f ~ _{i} ( x ~ _{i} )} 1_{[S_{N}^{(i)} > x, M_{N}^{(i)} = \tilde{x}_{i}]} j \neq = i \prod f_{j} (x_{j}) d x_{j} \tilde{f}_{i} (\tilde{x}_{i}) d \tilde{x}_{i} = \int 1_{[S_{N} > x, M_{N} = x_{i}]} j \prod f_{j} (x_{j}) d x_{j} = P [S_{N} > x, M_{N} = X_{i}]

E_{(i)} [\frac{f _{i} ( X _{i} )}{f ~ _{i} ( X _{i} )} 1_{[S_{N}^{(i)} > x, M_{N}^{(i)} = \tilde{X}_{i}]}] = \int \frac{f _{i} ( x ~ _{i} )}{f ~ _{i} ( x ~ _{i} )} 1_{[S_{N}^{(i)} > x, M_{N}^{(i)} = \tilde{x}_{i}]} j \neq = i \prod f_{j} (x_{j}) d x_{j} \tilde{f}_{i} (\tilde{x}_{i}) d \tilde{x}_{i} = \int 1_{[S_{N} > x, M_{N} = x_{i}]} j \prod f_{j} (x_{j}) d x_{j} = P [S_{N} > x, M_{N} = X_{i}]

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Large Deviations of Factor Models with Regularly-Varying Tails: Asymptotics and Efficient Estimation

Farzad Pourbabaee*label=e1][email protected] [

Omid Shams Solari*†* label=e3][email protected] label=u1 [[

url]http://solari.stat.berkeley.edu

Department of Economics* and Department of Statistics*†*

University of California, Berkeley.

Department of Economics

University of California, Berkeley

E-mail:

University of California, Berkeley

Department of Statistics

Abstract

We analyze the Large Deviation Probability (LDP) of linear factor models generated from non-identically distributed components with regularly-varying tails, a large subclass of heavy tailed distributions. An efficient sampling method for LDP estimation of this class is introduced and theoretically shown to exponentially outperform the crude Monte-Carlo estimator, in terms of the coverage probability and the confidence interval’s length. The theoretical results are empirically validated through stochastic simulations on independent non-identically Pareto distributed factors. The proposed estimator is available as part of a more comprehensive Betta package.

Monte-Carlo Estimation, Tail estimation, Conditional Monte-Carlo, Rare-Event Simulation, Stochastic Simulation,

keywords:

\startlocaldefs\endlocaldefs

and 333Authors contributed equally to this manuscript.

1 Introduction

Large deviation probability (LDP) estimation is a well-studied problem in various branches of research; from finance and economics, to particle physics and weather forecasting. Researchers are often interested in the probability of occurrence of catastrophes, i.e. major over-shoots or under-shoots of an outcome comprised of a few input resources. A prominent example of which is the estimation of LDP for the factor models. The most well-studied case is estimating the LDP of sums of iid random variables. Namely, estimating $\mathsf{P}\left[X_{1}+\ldots+X_{N}>x\right]$ for finite $N$ where $x$ is very large.

Such LDP estimation is well-studied when factors are thin-tailed and/or their class of distribution functions is stable under addition, e.g. Gaussian or Gamma factors. The statistical analysis is particularly straightforward in these cases, because of the available closed-form expressions for the right or left tail probability. However, the majority of cases do not fall in this line, as in many cases this stability does not hold, and we can not appeal to analytical expressions for the deviation probability. An important example is the class of heavy-tailed distributions. Loosely speaking, for this class of random variables the rare events occur more frequently than in a light-tailed distribution such as Gaussian. Gabaix (2016) enumerated many examples in which Power law distribution emerges, such as firm and city size, income and wealth distribution, and CEO compensations. Asmussen et al. (2000) proposed the first efficient algorithm for LDP estimation of linear factor models with heavy-tailed iid components. They introduce a Conditional Monte-Carlo (CMC) algorithm which benefits from conditioning on order statistics. Chan and Kroese (2011) utilize the same estimator of Asmussen et al. (2000) in specific settings. They apply it to independent but non-identically distributed (ind) case, where the factors’ distribution is restricted to be either Weibull or Pareto. Independent from Chan and Kroese (2011), we developed a CMC algorithm based on a comprehensive asymptotic description of how rare events occur in ind setting when factors are regularly varying. In contrast to Chan and Kroese (2011) we provide theoretical guarantees establishing the faster convergence of our estimation algorithm relative to crude Monte-Carlo.

Conditioning is quite appealing since the classical Monte-Carlo methods for estimating LDP fall short, precisely because a large number of samples need to be drawn to get non-zero realizations of the sampling event. However, there are more efficient tools to address this problem, such as importance sampling. The idea is essentially to sample from another probability measure that assigns more weight to the regions where the sampling function takes larger values, and then correct for the transformation of the sampling measure. Ackerberg (2000) showed that importance sampling can reduce the computational burden for smoothing the simulated moments, as first suggested by McFadden (1989).

However, for the case of heavy-tail distributions, the general111Referred as “general” because many of the known methods of measure change are based on using the moment generating function as the Radon-Nikodym derivative. However, there are potentially heuristic ways to choose the sampling distribution according to the particular type of the unknown target variable, which is sought to be estimated. measure transformation methods such as importance sampling are not favorable at best and inapplicable at worst. One reason is that higher moments as well as moment generating function, which are the essence of measure transformation methods, do not exist for this class. Secondly, due to the degeneracy of the likelihood ratio in high-dimensional models, these methods are not useful (Rubinstein and Kroese (2016)). In this paper, a novel technique (based on conditional Monte-Carlo sampling) is introduced to address the problem of tail estimation for the sum of ind random variables belonging to a large subclass of heavy tails, namely regularly-varying (RV) distributions 222This class of distributions is defined in depth in Feller (2008) section 8.8..

In the context of insurance risk, Goovaerts et al. (2005) studies the tail asymptotics of randomly weighted sum of iid Pareto factors. Further, Foss and Richards (2010) find asymptotic results for the sum of conditionally independent factors under rather stringent conditions on the structure of factors’ dependencies. Albrecher, Asmussen and Kortschak (2006) and Kortschak and Albrecher (2009) use Copulas to capture the dependence structure of the factors, and derived similar asymptotic results for the tail probability. The main contributions of this paper are to provide asymptotics for the deviation probability of the sum and maximum of independent, $\mathbb{R}$ -valued, RV random variables; and to propose an improved Monte-Carlo method for estimating these likelihoods.

Likelihood estimation of such extreme events arises in many places: notably the extreme losses or profits of a portfolio exposed to multiple independent risk factors. Another example studied in Acemoglu, Ozdaglar and Tahbaz-Salehi (2017) is the frequency of large economic downturns, and significant GDP departures from equilibrium trend, caused by the heavy-tail nature of micro shocks, wherein independent factors with Pareto tails add up and create large swings. The challenge is that for all these cases, the extreme tail probabilities are excessively small. Therefore, finding non-trivial confidence intervals for them is not just a matter of their size, but more importantly how big or small are they relative to the sought probability. Namely, it is the relative error, the length of the confidence interval divided by the point estimate, that matters for reporting the estimation precision. For example, in the case of a simple indicator random variable $1_{A}$ , suppose that we are after $\mu=\mathsf{P}A$ . The per sample variance for the crude Monte-Carlo is $\mu(1-\mu)$ , which indeed goes to zero as $\mu\to 0$ . However, the relative error (standard deviation over mean), roughly scales as $1/\sqrt{\mu}$ , which becomes arbitrary large. The proposed estimation method in this paper fixes this issue, which arises in the crude Monte-Carlo, and advances a bounded relative error as the size of the target probability vanishes.

The paper is organized as follows. In section 2, the optimality conditions for estimation are defined and some notions for the Gaussian case are explored. Next, in section 3, we establish some results on the tail asymptotics of RV sums, and the CMC algorithm, along with its concentration analysis and comparisons with crude Monte-Carlo. In section 4, the implications for portfolios of many assets with heavy tails are studied. In section 5, the exponential efficiency of our proposed CMC algorithm relative to the crude Monte-Carlo estimator is demonstrated through the simulations. The proofs of the propositions and theorems along with simulation details are presented in the appendix.

2 Gaussian Factor Model

In this section, we present a brief overview of the use of importance sampling as a method of variance reduction in the estimation of a large deviation probability under Gaussian factors. This would serve as an introduction that paves the way for the main results of the paper. Assume there are $M$ assets available in the market whose returns are driven by $k$ latent factors $\phi=(\phi_{1},\ldots,\phi_{k})$ . The return to the $i$ -th security is captured as a linear combination of the latent factors and the idiosyncratic risk, which is assumed uncorrelated with $\phi$ :

[TABLE]

Asset returns are all evaluated over the time interval $[t,t+\tau]$ , where $\tau$ is the investment horizon. Observations of high-frequency data confirm that the distribution of returns deviates more intensely from Gaussianity as the investment horizon becomes shorter. For the moment, suppose that $\tau$ is long enough that we can assume Normal distributions both for the factors and asset specific risks, in particular assume $\phi\sim\mathcal{N}(0,I_{k})$ and $\varepsilon_{i}\sim\mathcal{N}(0,\sigma^{2}_{i})$ are mutually independent. Let $\xi$ represent the return to market index, which is typically calculated as the market-cap weighted sum of security returns, but here for simplicity is taken as the unweighted average of $M$ returns:

[TABLE]

which has the Normal distribution $\mathcal{N}\left(0,\left\lVert\bar{\beta}_{2}\right\rVert^{2}+\sum_{i=1}^{M}\sigma^{2}_{i}/M^{2}\right)$ .

One can think of periods of market turmoil as the times when the market index reflects large downswings and upswings, namely $\lvert\xi\rvert>\lambda$ , and one might want to estimate the probability of these large fluctuations, e.g $\mathsf{P}\left[\xi>\lambda\right]$ for large $\lambda$ . Since no closed form expression for this integral exists, we have to resort to simulation methods. However, crude Monte-Carlo sampling from the distribution of $\xi$ requires drawing a large number of samples to find some that surpass the threshold $\lambda$ ; importance sampling can help to reduce the required number of sample points, or alternatively reduce the variance of the point estimator. Given that the cumulative generating function $\psi(\theta)$ exists for Gaussian distribution for all $\theta\in\mathbb{R}$ , one possible choice to get an appropriate importance sampling distribution is the exponential measure change through

[TABLE]

Specifically, If $\mathsf{P}$ denotes the actual probability measure for $\xi$ , the exponentially twisted measure $\mathsf{P}_{\theta}$ is then obtained by

[TABLE]

Now we can generate $n$ samples from $\mathsf{P}_{\theta}$ , and form the following sample average, which represents the unbiased estimator under the new measure $\mathsf{P}_{\theta}$ :

[TABLE]

Denote the per-sample estimator by $Z(\lambda)=1_{[\xi>\lambda]}\frac{\mathrm{d}\mathsf{P}}{\mathrm{d}\mathsf{P}_{\theta}}(\xi)$ . The next definition spells out two notions of relative error.

Definition 2.1.

The estimator $Z(\lambda)$ has bounded relative error if

[TABLE]

and is logarithmically efficient (a weaker notion) if for some $\varepsilon>0$

[TABLE]

The following result, which is proved in Asmussen (2008), sheds light on the efficiency of exponential twisting for a certain value of $\theta$ .

Theorem 2.2.

The exponential change of measure in (2.4) is logarithmically efficient for the unique parameter $\theta$ that solves $\lambda=\psi^{\prime}(\theta)$ .

As a result of this theorem, the optimal parameter for the measure change is

[TABLE]

Having stated this theorem, the following lines summarize the simulation steps for the likelihood estimation of the market index large fluctuations in the Gaussian case:

Find $\theta^{*}$ from (2.8). 2. 2.

Draw random samples $\xi_{i}$ , $i=1,\ldots,M$ from $\mathsf{P}_{\theta^{*}}$ . 3. 3.

Calculate $\frac{1}{n}\sum_{i=1}^{n}1_{[\xi_{i}>\lambda]}e^{\psi(\theta^{*})-\theta^{*}\xi_{i}}$ as an estimator of $\mathsf{P}\left[\xi>\lambda\right]$ .

As a result of twisting the sampling distribution, the relative error now scales as $\mathsf{P}\left[\xi>\lambda\right]^{-\varepsilon/2}$ , compared to $\mathsf{P}\left[\xi>\lambda\right]^{-1/2}$ for the classical Monte-Carlo. Equivalently, this boost shows us how to achieve a certain level of relative error with fewer sample points. However, this machinery can not always be employed, because the moment generating function need not always exist. Therefore, to find the optimal measure change we have to appeal to heuristic methods, or use other Monte-Carlo methods as explained further in the proceeding sections.

3 Regularly-Varying Factors

In this section we study the consequences of dealing with independent factors with heavier tails than Gaussians. In particular, the factors are assumed to have regularly-varying tails, for example ones with Pareto tails. This class of distribution functions is contained in the larger family of sub-exponential distributions as defined below.

Definition 3.1.

The distribution $F$ of a non-negative random variable $X$ is called sub-exponential, if

[TABLE]

where $X_{i}$ ’s are iid copies drawn from $F$ 111For more, check definition 1.3.3 in Embrechts, Klüppelberg and Mikosch (2013)..

This definition extends to probability distributions on the entire real line by restriction to the positive and negative halves. Then, the random variable $X\sim F$ , taking values in $\mathbb{R}$ , is called sub-exponential if $X_{+}=(X\vee 0)$ and $X_{-}=-(X\wedge 0)$ are both sub-exponentials. Equation (3.1) says that the probability that the sum of $N$ iid sub-exponential random variables exceeds a certain threshold is roughly $N$ times the probability that one of them exceeds that level. The question is thus what happens if the random variables are independent and individually sub-exponential but not necessarily identically distributed? Is the deviation probability for the sum related to the sum of deviation probabilities of the summands, and if so, under what conditions? As pointed out in the introduction, variations of these questions are studied under different conditions for the factors.

In the remainder of this paper, we restrict ourselves to the case of sum of non-identical, independent, real-valued random variables. We answer this question under a mild condition, which is typically satisfied by long-tailed distributions.

Condition 3.2.

Given the distribution $F$ , there exists an eventually increasing function $h(x)$ such that $\lim_{x\to\infty}h(x)=\infty$ and

[TABLE]

where $\bar{F}(x):=1-F(x)$ .

Example 3.3.

Suppose $X$ is Power law distributed with coefficient $\mu$ , i.e $\mathsf{P}\left[X>x\right]\propto x^{-\mu}$ . Then, one can check that $h(x)=x^{\delta}$ , for any $0<\delta<1$ , satisfies condition 3.2.

The following notation is used throughout the paper.

Notation 3.4 (Asymptotic equivalence).

$f(x)\sim g(x)$ if $f(x)/g(x)\to 1$ , as $x\to\infty$ .

The next definition expresses the notion of the RV distribution; see Feller (2008) for more elaboration.

Definition 3.5 (Regularly-Varying (RV) distribution).

A distribution function $F$ has a regularly varying tail, if $\bar{F}(x)\sim L(x)/x^{\alpha}$ as $x\to\infty$ , where $\alpha>0$ and $L(\cdot)$ varies slowly at infinity, i.e.

[TABLE]

Functions such as $\log(x)$ , $\log(\log(x))$ and any convergent function to a bounded level are examples of slow-variation. The RV property depends only on the behavior of the distribution at infinity, so it does not matter how it behaves at intermediate points. One stylized observation about this family of distributions is that they have finite moments of order less than $\alpha$ , but not more. This will restrain us from using moment generating function to obtain large deviation results.

Claim 3.6.

For all distribution functions of regular variation, we can take $h(x)=x^{\delta}$ with any $0<\delta<1$ , and condition 3.2 will hold. The corollary of theorem 1 in section 8.8 of Feller (2008) paves the way to prove this claim, which allows us to represent the slowly varying function $L(\cdot)$ as

[TABLE]

where $\varepsilon(x)\to 0$ and $a(x)\to c$ as $x\to\infty$ . Therefore,

[TABLE]

where the first term converges to 1, and the second term’s exponent is approaching zero, because

[TABLE]

Remark 3.7.

The next two results on large deviation of sum and maximum of a sequence of not necessarily identical, independent and $\mathbb{R}$ -valued random variables, are built on the well-known properties of iid, $\mathbb{R}^{+}$ -valued RV random variables in Embrechts, Klüppelberg and Mikosch (2013)222Precisely, for nonnegative, sub-exponential and iid random variables $(X_{i})_{i=1,\ldots,N}$ , $\mathsf{P}\left[X_{1}+\ldots+X_{N}>x\right]\sim\mathsf{P}\left[\max_{1\leq i\leq N}X_{i}>x\right]\sim N\mathsf{P}\left[X_{i}>x\right]$ when $x\to\infty$ ; as stated in Embrechts, Klüppelberg and Mikosch (2013) section 1.3.2..

The following theorem assumes condition 3.2, and displays an asymptotic equivalence result for the tail probability of an independent RV sum.

Theorem 3.8.

Suppose $X_{1},\ldots,X_{N}$ are independent random variables in $\mathbb{R}$ , such that:

(i)

An RV distribution $F$ exists, where $\bar{F}_{i}(x)\sim c_{i}\bar{F}(x)$ for all $i$ ’s and at least one $c_{i}\neq 0$ , 2. (ii)

A function $h(\cdot)$ exists that satisfies condition 3.2 for $F$ ,

then the following asymptotic result holds:

[TABLE]

Another interesting feature of sub-exponential distributions is the so-called catastrophe principle, that roughly states that the iid sum of non-negative sub-exponential random variables is large if and only if one of them is large. To put it in a more precise way, here is the formal definition of this property:

Definition 3.9.

The distribution function $F$ with support on $[0,\infty)$ is said to satisfy the catastrophe principle, if

[TABLE]

where $X_{1},\ldots,X_{N}$ are iid draws from $F$ .

In particular, the sub-exponential family has this property. However, we want to know what happens to the maximum factor under the more general conditions of theorem 3.8: when the random variables are independently drawn from non-identical distributions, and can take negative as well as positive values. The next theorem examines the behavior of the maximum term up to a certain constant.

Theorem 3.10.

Suppose $X_{1},\ldots,X_{N}$ are independently drawn from $F_{1}\ldots,F_{N}$ , and take values in $\mathbb{R}$ . Then, under the same conditions (i) and (ii) of theorem 3.8, the following asymptotic result holds:

[TABLE]

Remark 3.11.

There is nothing very special about the upper bound constant $(1-e^{-1})^{-1}$ . It only paves the way for upper-bounding $e^{-x}$ by an affine function. More is explained in the appendix, where we explain under a bit more stringent conditions, the exact statement of the catastrophe principle would be obtained, namely $\mathsf{P}\left[\max_{1\leq i\leq N}X_{i}>x\right]\sim\sum_{i=1}^{N}\mathsf{P}\left[X_{i}>x\right]$ in this case.

An important take-away from this result is that even under the extended case (non-identical and $\mathbb{R}$ -valued random variables), the catastrophe principle asymptotically holds up to a constant. More precisely, the probability that the sum exceeds a large value is of the same order of the maximum summand exceeding the same threshold. This can also be interpreted in another sense: aggregate fluctuations do not become extremely large by accumulating small variations; rather, there has to be a single factor with large deviation to support such an extreme event.

3.1 Conditional Monte-Carlo Algorithm

The asymptotic result in theorem 3.8 regarding the tail probability of the sum can be used to take $(\sum_{i=1}^{N}c_{i})\bar{F}(x)$ as an estimator for $\mathsf{P}\left[X_{1}+\ldots+X_{N}>x\right]$ . However, this estimation performs weakly in many cases, and simulation based on that will be inaccurate. A conditional Monte-Carlo algorithm is developed in Asmussen et al. (2006) to cope with the tail probability of sum of iid heavy tails. That idea is incorporated here to obtain an estimator for the sum of independent but non-identical factors. The algorithm goes as follows:

(i)

Sample $X_{i}$ from its corresponding distribution $F_{i}$ for $i=1,\ldots,N$ . 2. (ii)

Let $M_{N}=\max\{X_{i}:i\in[N]\}$ . 3. (iii)

Compute $Z(x)=\sum_{i=1}^{N}\mathsf{P}\left[S_{N}>x,M_{N}=X_{i}\rvert X_{-i}\right]$

The proposed $Z(x)$ is an unbiased estimator of $\mathsf{P}\left[S_{N}>x\right]$ 333The proof of this claim is simple and thus omitted.. The notation $X_{-i}$ is used to denote all random variables excluding $X_{i}$ , and $S_{N}$ represents the sum of generated random variables from independent distributions, i.e $X_{1}+\ldots+X_{N}$ . It is shown in Asmussen et al. (2006) that the estimator in step 3 of the algorithm 3.1 has bounded relative error for non-negative iid case, when the common distribution $F$ has RV form.

Remark 3.12.

Consult appendix B for a detailed discussion on the computational complexity of algorithm 3.1.

The following theorem establishes the same result of Asmussen et al. (2006), but for the extended case of not necessarily identical $\mathbb{R}$ -valued factors.

Theorem 3.13.

If $F$ has regularly varying tail, then estimator $Z(x)$ in algorithm 3.1 has bounded relative error, namely

[TABLE]

Proof of Theorem 3.13.

Denote $M_{N,-i}=\max\{X_{-i}\}$ , $S_{N,-i}=\sum_{j\neq i}X_{j}$ , and let $\widetilde{X_{i}}$ be an independent copy of $X_{i}$ . Note that $Z(x)$ is implicitly a statistic generated from $X_{1},\ldots,X_{N}$ , thereby a random variable.

[TABLE]

One can check that if $M_{N,-i}\leq x/N$ then $x-S_{N,-i}\geq x/N$ , thereby $M_{N,-i}\vee\left(x-S_{N,-i}\right)\geq x/N$ always. Consequently, $Z(x)$ is asymptotically upper bounded by $\left(\sum_{i=1}^{N}c_{i}\right)\bar{F}(x/N)$ , which yields to

[TABLE]

∎

3.2 CMC Concentration and Efficiency Analysis

The CMC algorithm can be repeated $n$ times with the outcome of $i$ th step being referred as $Z_{i}(x)$ , and the sample average is denoted by $\bar{Z}_{n}(x)$ . Let $\mu(x):=\mathsf{P}\left[X_{1}+\ldots+X_{N}>x\right]$ , and $\sigma(x)^{2}:=\text{Var}(Z(x))$ . Then, a simple application of central limit theorem yields to:

[TABLE]

Therefore, one can get the following asymptotic confidence interval for the large deviation probability of $\bar{Z}_{n}(x)$ :

[TABLE]

where $\Phi(\cdot)$ is the Gaussian CDF. Thus, for large enough $n$ and $x$ we have $\bar{Z}_{n}(x)\in\left(\mu(x)(1-\kappa),\mu(x)(1+\kappa)\right)$ with probability of at least $\left(2\Phi(\kappa\sqrt{n}N^{-\alpha})-1\right)$ . Another way to find the concentration bound on $\bar{Z}_{n}(x)$ is to use the Markov’s inequality:

[TABLE]

where the last inequality uses the final bound in theorem 3.13 and holds for large enough $x$ . Finally, we express a stronger approach to get a concentration bound based on the notion of sub-Gaussian random variables.

Definition 3.14 (Van Der Vaart and Wellner (1996)).

A random variable $X$ with mean $\mu=\mathsf{E}X$ is called sub-Gaussian, if there exists $\sigma>0$ , such that

[TABLE]

Remark 3.15.

Suppose that the random variable $X$ with mean $\mu$ is sub-Gaussian with parameter $\sigma$ , then the following Chernoff deviation bound would immediately fall out:

[TABLE]

One can show that if $X$ takes value in $[a,b]$ , then its sub-Gaussianity parameter is $(b-a)/2$ . By looking at the computations in theorem 3.13, we can confirm that $Z(x)\in[0,\sum_{i=1}^{N}\bar{F}_{i}(x/N)]$ , thereby $Z(x)$ is sub-Gaussian with parameter $\sum_{i=1}^{N}\bar{F}_{i}(x/N)/2$ , and the following deviation bound results from remark 3.15:

[TABLE]

In the last step we use the tail approximation for both $\mu(x)$ and the sum in the exponent’s denominator. This is a one-shot bound, namely just for one trial of CMC algorithm, whereas if we repeat this process $n$ times, and take the sample average, then we get a much sharper precision:

[TABLE]

As can be viewed in all three bounds (3.13), (3.14) and (3.18) the ratio $N^{2\alpha}/n$ turns out to be the key parameter controlling the decay rate of error probability. For instance, if $N=10$ , and $\alpha=2$ , we need to repeat CMC algorithm $10^{4}$ times to get small error probability. An important observation here is that $n$ scales proportional to $N^{2\alpha}$ , thus for fixed error rate smaller values of $\alpha$ lead to faster convergence rate, which makes more sense once we recall that the smaller levels of $\alpha$ correspond to the fatter tails. Therefore, the tail asymptotic equivalence relation will be achieved at smaller $x$ ’s, equivalently, the error in tail probability estimation would be smaller for fixed $x$ .

The proposed CMC algorithm asymptotically outperforms the crude Monte-Carlo sampling in the sense of estimator’s efficiency, namely for certain precision level $\kappa$ , the deviation probability of CMC estimator is smaller than its regular sample mean counterpart, known as

[TABLE]

where $X_{i}^{(k)}$ is the $k$ th independent draw from $F_{i}$ . The main theoretical result of the paper is presented next, in that we establish the exponential boost obtained via the proposed CMC estimator relative to the crude Monte-Carlo counterpart.

Theorem 3.16.

For any precision level $0<\kappa<1$ , the CMC estimator $\bar{Z}_{n}(x)$ is exponentially more efficient than $\hat{\mu}_{n}(x)$ . Namely, for any $0<r<2\kappa^{2}N^{-2\alpha}$

[TABLE]

3.3 Importance Sampling Algorithm

The goal in this part is to develop an alternative to CMC based on importance sampling. Inspired by the argument in previous part, we exploit the partitioning method based on $M_{N}$ . Suppose $f_{i}$ is the density of $X_{i}$ , and $\tilde{f}_{i}$ is the alternative density, which is the candidate for importance sampling. Let $\mathrm{d}\mathsf{P}_{(i)}=\mathrm{d}F_{-i}\otimes\mathrm{d}\widetilde{F}_{i}$ be the product measure generated from all original distributions bare $F_{i}$ , where $\tilde{F}_{i}$ is used instead, and let $\mathsf{E}_{(i)}$ express the expectation with respect to $\mathsf{P}_{(i)}$ . After all, the importance sampling steps follow as:

Generate $X_{i}\sim F_{i}$ , and $\widetilde{X_{i}}\sim\widetilde{F}_{i}$ . 2. 2.

Let $S_{N}^{(i)}:=S_{N,-i}+\widetilde{X}_{i}$ , and $M_{N}^{(i)}=\max\{X_{-i},\widetilde{X}_{i}\}$ . 3. 3.

Take $\sum_{i=1}^{N}\frac{f_{i}(\widetilde{X}_{i})}{\tilde{f}_{i}(\widetilde{X}_{i})}1_{[S_{N}^{(i)}>x,M_{N}^{(i)}=\tilde{X}_{i}]}$ as an estimator for $\mathsf{P}\left[S_{N}>x\right]$ .

To show the unbiasedness of the estimator, let us for example take the expectation of the $i$ th summand with respect to $\mathsf{E}_{(i)}$ :

[TABLE]

Hence, $\sum_{i=1}^{N}\mathsf{P}\left[S_{N}>x,M_{N}=X_{i}\right]=\mathsf{P}\left[S_{N}>x\right]$ and the unbiasedness is resulted. Although the introduced estimator is unbiased, but as a downside it is shown in Asmussen et al. (2006) that even for iid non-negative factors, it falls behind the CMC estimator let alone for our purpose. Moreover, one needs to find appropriate candidates for sampling distributions ( $\tilde{f}_{i}$ ’s), where there is no general recipe to follow besides heuristics. The related literature is yet to find appropriate candidates for the sampling distributions in importance sampling, so likewise the question remains open in our case.

4 Market Portfolio Large Deviation Probability

One of the main motivations of studying RV distributions in this paper was to capture the large deviations of asset returns, as initially laid out for the Gaussian case. Now, consider the scenario in which the factor returns have Power law tails, i.e $\mathsf{P}\left[\phi_{i}>x\right]\propto x^{-\tau_{i}}$ , that happens to be the case in many empirical stock return observations, see for example Cont (2001) and Gopikrishnan et al. (1998). Then, the demeaned market index return can be modeled as the sum of independent zero mean factors combined with an independent noise, as seen before in (2.2):

[TABLE]

Since $\phi_{i}$ is assumed to have Power law tail, so does $\bar{\beta}_{i}\phi_{i}$ with the same tail coefficient. Therefore, letting $\tau=\min\{\tau_{i}:i\in[k]\}$ and $\gamma\in\{i:\tau_{i}=\tau\}$ , the supporting distribution $F$ in the sense of theorem 3.8 would be a Power law with coefficient $\tau$ (more precisely $F\stackrel{{\scriptstyle d}}{{=}}\beta_{\gamma}\phi_{\gamma}$ ), and

[TABLE]

Moreover, hypothetically one can impose Gaussian structure on the idiosyncratic risk terms, and treat them as independent factors that have vanishing tail probabilities relative to the heaviest tail component, $\beta_{\gamma}\phi_{\gamma}$ , namely

[TABLE]

Then, the result of theorem 3.8 implies that, for large $\lambda$ :

[TABLE]

As a result of this asymptotic tail equivalence, we can contemplate that only the factors with the heaviest tails contribute to the extreme events, and the market large fluctuations are mainly driven by them. Particularly, in terms of hedging against extreme events, the risk managers shall not worry about the factors with fat body distribution but light tails, even if they add a sizable portion of the portfolio variance, rather they should mainly concern about highly skewed ones.

Next, let us investigate the case, where the market portfolio is generated by aggregating a large number of individual stocks, uniformly weighted without loss of generality in this context. It is often observed that after factor extraction the remaining idiosyncratic parts reflect fat-tailed dispersions and treating them as Gaussians is quite unrealistic. Therefore, their deviation could possibly affect the aggregate index fluctuations. However, we show this is not true in the sense that each one can individually affect the fluctuations of its corresponding security, but once added together and averaged out, the aggregate noise deviation probability would have negligible effect compared to the contribution of factors with heavier tails. More precisely, as described above let $\eta_{i}=\langle\beta_{i},\phi\rangle+\varepsilon_{i}$ be the return to the $i$ th security, while $\varepsilon_{i}$ is no longer required to be Gaussian, but can take any RV form. The following proposition asserts this claim in a more definitive form.

Proposition 4.1.

Let $\eta_{i}=\langle\beta_{i},\phi\rangle+\varepsilon_{i}$ be the return to the $i$ th security, such that idiosyncratic residuals likewise the factor returns are independent and have RV tails. Then, given the existence of a supporting distribution $F\sim L(x)/x^{\alpha}$ as in 3.2 with $\alpha>1$ , and uniformly bounded proportionality coefficients $\{c_{i}\}$ of individual noise distributions with respect to $F$ ( $\max_{i\in[M]}c_{i}<c$ ), we get

[TABLE]

for fixed large $x$ .

The important result of this proposition is that under some regularity conditions on the residual security risks, the aggregate effect of these factors to the frequency of market index fluctuations will vanish for large portfolios of assets. Therefore, the large deviation of portfolios of many assets is mainly controlled by the common factors, which appear in all individual asset returns. One can think of this result as a version of the central limit theorem type argument across independent residuals, but in the case of independent and non-identical variables with fat tails. The market index large deviation probability can then be approximated as:

[TABLE]

The first asymptotic equivalence simply follows from theorem 3.8 as $x$ gets large, and the second equivalence $(*)$ falls out by sending $M\to\infty$ , in addition to the assumption that the average factor loading vector converges as $M\to\infty$ , namely,

[TABLE]

The methods such as CMC and importance sampling that introduced in previous section can now be employed to find estimators for extreme deviation probability of market index return.

5 Simulations

Equation 3.20 is perhaps the most consequential result of this read. In present section this result is unpacked and validated through several simulations. In what follows we demonstrate that our proposed estimator is exponentially more efficient than the crude Monte-Carlo estimator. More formally, we demonstrate that for any precision level $0<\kappa<1$ and finite number $N$ of independent Pareto factors,

[TABLE]

shrinks with a rate of at least $r$ as a function of the sample size $n$ as $x\rightarrow\infty$ , where $0<r<2\kappa^{2}N^{-2\alpha}$ and $\alpha=\min_{1\leq i\leq N}\alpha_{i}$ is the shape parameter corresponding to the factor with the heaviest tail. The minimum rate $r$ is estimated using a linear mixed-effects model, details of which is explained in Appendix C.3. Also for brevity from now on we refer to $\log(\Lambda)$ as the LR ratio.

Remark 5.1.

The CMC estimator, see algorithm 3.1, is exponentially more efficient relative to the crude Monte-Carlo estimator with a rate of at least $r$ if there is an $r>0$ for which the convex hull of $\left\{(i,\log(\Lambda_{i})):i=1,\ldots,n\right\}$ is bounded above by $f(i)=-ri$ .

In what follows equation 3.20 is validated through simulations while estimation sensitivity with respect to $\alpha$ and $x$ is studied.

5.1 Variable Deviation Bound

Here we examine the relationship between the LR and the sample size as we move deviation bound further from the mean. Figure 1 illustrates $\log(\Lambda)$ vs. the sample size $n$ as the deviation bound $x$ increases. It is clear from this result that our estimator maintains exponential efficiency through a wide range of deviation bounds and the rate of efficiency increases with that bound. Deviation bounds, along with their corresponding LDP and rate $r$ are presented in Table 1.

5.2 Examining The Catastrophe Principle

In this subsection two simulations are performed which aim to validate the Catastrophe principle. To this end, we simulate $M$ different factor models where each model contains $N$ Pareto factors with shape parameters $\alpha_{i1},\ldots,\alpha_{iN}$ , $i=1,\ldots,M$ . We consider two cases: (1) groups that share the same $\alpha_{min}$ , but the average tail thickness $\bar{\alpha}_{i}$ is different between models, (2) $\alpha_{min}$ is different between groups but each group shares the same $\bar{\alpha}$ with a group in the first case.

Figure 2 manifests the sensitivity of LR and $\mu$ to $\alpha$ . Evidently, our CMC estimator maintains the exponential efficiency whose rate increases with mean tail thickness, denoted as $\bar{\alpha}$ .

Table 2 helps characterizing this increase more clearly. According to this table, $\mu$ is not sensitive to $\bar{\alpha}$ if all other factors have tails which are significantly thinner than $\alpha_{min}$ . However, $r$ increases orders of magnitude which points to the high variability inherent in Monte-Carlo method.

In order to construct better characterization of our CMC method’s sensitivity to maximum tail thickness, the previous simulation is repeated but this time $\alpha_{min}$ is not constant anymore but shape parameters are chosen in a way that for each model in the previous simulation, there exists a model in this simulation with equal $\bar{\alpha}$ . In essence, mean tail thicknesses are similar in the two simulations. As before, CMC method maintains its dominance as $\alpha_{min}$ increases; however, upon consulting Table 3, we observe that, contrary to the previous simulation, while $\mu$ decreases by orders of magnitude between factor models, $r$ does not change drastically.

6 Conclusion

This read covers a comprehensive asymptotic characterization of LDPs in the case of linear factor models with ind Regularly Varying factors. Exploiting this characterization, a Conditional Monte-Carlo estimator is proposed which has the same empirical time complexity, see Appendix B, but is proven to be exponentially more efficient relative to the crude Monte-Carlo, see theorem 3.20. This claim was validated through extensive simulations while empirically characterizing the large deviation probabilities of the aforementioned factor models. Thus providing empirical support for the theoretical results presented in section 3. We hope to generalize the results of this article, especially the asymptotic behavior of linear factor models, to larger class of factor models, i.e. $\phi(\textbf{X})$ $\mathbf{X}\in\mathbb{R}^{N}$ where $\phi\in\Phi$ a larger class of functions. Another important future direction is to study the LDP of factor models where $X_{i}$ are not necessarily independently distributed which can be used to estimate the LDP of portfolios with dependent assets.

Appendix A Proofs

A.1 Proof of Theorem 3.8

First, the following lemma is proven, then the theorem’s proof follows.

Lemma A.1.

Let $F$ have regularly varying tail, namely $\bar{F}(x)\sim L(x)x^{-\alpha}$ for some $\alpha>0$ . Then, there exists $0<\delta<1$ , such that for $h(x)=x^{\delta}$ ,

[TABLE]

.

Lemma 2 of chapter 8 in Feller (2008) ensures that for every $\varepsilon>0$ , there exists $x_{0}$ , such that for all $x>x_{0}$ : $x^{-\varepsilon}<L(x)<x^{\varepsilon}$ . Now one can check that by taking $\varepsilon<\alpha/5$ and $1>\delta>3/4$ the desired result follows:

[TABLE]

∎

.

I justify equation (3.6) for the case of two random variables, $X_{1}$ and $X_{2}$ , then the general case will follow by a straight induction. The argument goes through a similar line of proof as in Foss and Richards (2010), but I am going to leverage the independence to relax some of its necessary conditions. The idea is to upper and lower bound $\mathsf{P}\left[X_{1}+X_{2}>x\right]$ by $\mathsf{P}\left[X_{1}>x\right]+\mathsf{P}\left[X_{2}>x\right]$ with some vanishing approximation errors (that are approaching 0 as $x\to\infty$ , faster than $\bar{F}(x)$ , henceforth denoted by $o(\bar{F}(x))$ ). First, the upper-bound is verified:

[TABLE]

The first two terms can be approximated by leveraging assumptions (i) and (ii) of the theorem. For example:

[TABLE]

where the first and last approximations hold because of (i), and the middle one is guaranteed by (ii) and condition 3.2. Furthermore, the last probability will be of order $\bar{F}(x)$ as $x\to\infty$ :

[TABLE]

where the last term is of order $o(\bar{F}(x))$ because of lemma A.1, hence is negligible compared to the first two terms in equation (A.3), that concludes the upper bound. Next, the lower bounding goes as:

[TABLE]

where each of the first two terms decouples, and again because of presumptions (i) and (ii) of the theorem, the first one for instance can be approximated as

[TABLE]

Similar reasoning implies that the third term in (A.6) is of order $o(\bar{F}(x))$ , therefore vanishing compared to the first two terms in (A.6). The lower bound is now justified, hence the first approximation in equation (3.6) is concluded. Finally, approximation of the sum of tail probabilities with $\bar{F}$ follows immediately as a result of the first presumption of the theorem. ∎

A.2 Proof of Theorem 3.10

.

First, the lower bound is shown:

[TABLE]

where the last equality is an immediate application of the Taylor’s lemma. Showing the upper bound mainly falls in the same steps, but requires invoking the inequality $e^{-x}\leq 1-(1-e^{-1})x$ , that holds for $x\in[0,1]$ .

[TABLE]

Through a graphical scheme it becomes clear that $e^{-x}\leq 1-ax$ for $a<1$ , and small enough $x$ . Therefore, it is possible to approach $a\uparrow 1$ and control for the size of all $\bar{F}_{i}(x)$ , $i=1,\ldots,N$ . Under the case where the convergence of $\bar{F}_{i}(x)/c_{i}\bar{F}(x)$ (in condition (i) of theorem 3.8) is uniform over all $i=1,\ldots,N$ , one can send $a$ to 1 from below slower than the speed of $\bar{F}(x)\to 0$ , thereby a tighter upper bound will be obtained in (A.9) with the pre-factor $1$ rather than $(1-e^{-1})^{-1}$ . ∎

A.3 Proof of Theorem 3.16

.

To prove the proposition we need the following lemma, that paves the way for the main verification.

Lemma A.2.

Let $S_{n}:=\sum_{k=1}^{n}\xi_{k}$ , where $\xi_{k}$ ’s are iid Bernoulli random variables with success probability of $\alpha$ , then

[TABLE]

where $D(\cdot||\cdot)$ is the Kullback-Leibler divergence, that is known to be

[TABLE]

Proof.

For the notational simplicity let $m=\lfloor n\delta\rfloor$ , and $\tilde{\delta}=m/n$ , then:

[TABLE]

Take the auxiliary binomial random variable $Y\sim\text{Bin}(n,\tilde{\delta})$ , then $\mathsf{P}\left[Y=\ell\right]$ is maximized when $\ell=m=\lfloor n\delta\rfloor$ . The following loose bound falls out for $\binom{n}{m}$ :

[TABLE]

Implying that $\binom{n}{m}\geq(n+1)^{-1}e^{-n\left(\tilde{\delta}\log\tilde{\delta}+(1-\tilde{\delta})\log(1-\tilde{\delta})\right)}$ . Then, the proposed bound in the lemma drops out once this lower bound for $\binom{n}{m}$ is substituted in (A.12). ∎

Now we can return to the proof of the proposition, first by finding the lower bound for deviation probability of $\hat{\mu}$ :

[TABLE]

The first term is lower bounded using the result of lemma A.2 as:

[TABLE]

In a same manner the second term in (A.14) is lower bounded, with this in mind that $1-\hat{\mu}_{n}(x)$ is the Binomial sample mean in its own turn, but with the different success probability of $1-\mu(x)$ :

[TABLE]

Denote the KL-divergences in the exponents of (A.15) and (A.16) with $D_{1}$ and $D_{2}$ , respectively. Then, the convexity of $x\mapsto e^{-nx}$ implies:

[TABLE]

Then it is left to simplify and find an upper bound for $D_{1}+D_{2}$ , which is mainly carried out by leveraging the inequality: $x\geq\log(1+x)$ for $x\in(-1,1)$ .

[TABLE]

and

[TABLE]

Therefore, the following upper bound on $D_{1}+D_{2}$ falls out by adding up (A.18), and (A.19):

[TABLE]

After substitution of this bound in (A.17), it follows

[TABLE]

By using the upper bound on deviation probability of $\bar{Z}_{n}(x)$ in (3.18), we can see that

[TABLE]

where in the last equation, I used the large $x$ asymptotic. Now, for any rate $r$ smaller than $2\kappa^{2}N^{-2\alpha}$ , $x$ can be taken large enough, so that the ratio of deviation probabilities decays faster than $e^{-rn}$ . Consequently, the ratio of CMC estimator deviation probability over its crude Monte-Carlo counterpart decays exponentially in $n$ , pointing to the claim of theorem 3.16. ∎

A.4 Proof of Proposition 4.1

.

The result of theorem 3.8 can be employed again to asymptotically approximate the deviation sum with sum of deviations:

[TABLE]

The proportionality coefficients $c_{i}$ are defined in the usual fashion: $c_{i}=\lim_{x\to\infty}\frac{\mathsf{P}\left[\varepsilon_{i}>x\right]}{\bar{F}(x)}$ , and as stated in the theorem are uniformly bounded by a constant say $c$ . Therefore, relation (A.23) can be upperbounded as:

[TABLE]

The last conclusion holds because $L(\cdot)$ is slowly varying by definition and grows at a slower rate than any polynomial growth of $M$ (knowing that it is assumed $\alpha>1$ ). ∎

Appendix B Algorithm Complexity

As claimed in 3.12, our proposed CMC algorithm has almost the same time complexity w.r.t the sample size $n$ . Recall that the added complexity in our algorithm is only a result of evaluating a univariate distribution $N$ times, due to 3.9, which does not scale with $n$ and these evaluations can be done in $O(1)$ time using fast methods or hash tables in case of distributions with sparse support.

Appendix C Simulations

Simulations in this paper are carried out using Betta package addressed in C.3. In this section, for the sake of reproducibility, the settings under which the simulations in Section 5 were carried out are explained in detail.

Some parameters were kept constant between the simulations. The number of factors in each model, $N$ , was set to 10. Note that since an analytical solution for LDP estimation is not available in our case, we need to rely on stochastic simulations to estimate $\mu$ more accurately. After several experiments with our CMC estimator, we observed that as the sample size increases, estimation variance decreases and the mean estimate stays very close to the mean estimates using crude MC. Therefore, in order to estimate $\mu$ , we used the CMC estimator but with a very large, $n=1e7$ sample size. As for the precision parameter $\kappa$ we set it to $5e-3$ after many experiments. $\kappa$ should be small enough such that the difference between the estimators becomes more clear and large enough such that simulations do not end up with Nan’s due to occurrences of $\log(0)$ , especially with the MC estimator.

C.1 Simulation 5.1

In this simulation $x$ was changed from 100 to 1000 with steps of length 100. $\bm{\alpha}\in\mathbb{R}^{10}$ was set to values scattered equidistantly between 1 and 3.

C.2 Simulation 5.2

Here for the simulation where the minimum shape parameter was different between models, Figure 3, $\bm{\alpha}\in\mathbb{R}^{10}$ was chosen equidistantly between $[e,e+1]$ for $e$ chosen from a grid of length 10 uniformly placed in $[1,5]$ . The same instructions were used in the constant minimum shape parameter simulation of Figure 2; however, for each model, the first element of $\bm{\alpha}$ was dropped and 1 was appended to the vector. Then the elements of the vector, except the first element, were modified to keep a constant mean $\alpha$ among the models.

C.3 Estimating r

In order to validate equation 3.20, it is sufficient to show that the convex hull of $\left(n_{i},\log(\Lambda)_{i}\right)$ is bounded above by an affine function $f(n_{i})=-rn_{i}$ where $r>0$ .

For each $n_{i}$ , $\log(\Lambda)$ is evaluated 50 times, for all of which we use a single estimate for $\mu$ . Therefore we used a linear mixed-effects model to estimate $r$ . This way any grouping effect is considered as a random effect. Here is the model used:

[TABLE]

In that $i$ is the group corresponding to $n_{i}$ . REML was used to estimate the coefficients.

{supplement}

\sname

Supplement A \stitleBetta Package \slink[url]https://github.com/osolari/betta

\sdescription

Betta is a python package developed for the purposes of this paper. Upon installation of the package, objects of class CMC will be available. These objects may be input to the methods in the relativeEfficiencyLib module which contains the methods used for creating the results in section 5.

Bibliography18

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Acemoglu, Ozdaglar and Tahbaz-Salehi (2017) {barticle} [author] \bauthor \bsnm Acemoglu, \bfnm Daron \binits D., \bauthor \bsnm Ozdaglar, \bfnm Asuman \binits A. and \bauthor \bsnm Tahbaz-Salehi, \bfnm Alireza \binits A. ( \byear 2017). \btitle Microeconomic Origins of Macroeconomic Tail Risks. \bjournal American Economic Review \bvolume 107 \bpages 54–108. \endbibitem
2Ackerberg (2000) {barticle} [author] \bauthor \bsnm Ackerberg, \bfnm Daniel A \binits D. A. ( \byear 2000). \btitle Importance sampling and the method of simulated moments. \bjournal Department of Economics, Boston University and NBER. \endbibitem
3Albrecher, Asmussen and Kortschak (2006) {barticle} [author] \bauthor \bsnm Albrecher, \bfnm Hansjörg \binits H., \bauthor \bsnm Asmussen, \bfnm Søren \binits S. and \bauthor \bsnm Kortschak, \bfnm Dominik \binits D. ( \byear 2006). \btitle Tail asymptotics for the sum of two heavy-tailed dependent risks. \bjournal Extremes \bvolume 9 \bpages 107–130. \endbibitem
4Asmussen (2008) {bbook} [author] \bauthor \bsnm Asmussen, \bfnm Søren \binits S. ( \byear 2008). \btitle Applied probability and queues \bvolume 51. \bpublisher Springer Science & Business Media. \endbibitem
5Asmussen et al. (2006) {barticle} [author] \bauthor \bsnm Asmussen, \bfnm Søren \binits S., \bauthor \bsnm Kroese, \bfnm Dirk P \binits D. P. \betal et al. ( \byear 2006). \btitle Improved algorithms for rare event simulation with heavy tails. \bjournal Advances in Applied Probability \bvolume 38 \bpages 545–558. \endbibitem
6Asmussen et al. (2000) {barticle} [author] \bauthor \bsnm Asmussen, \bfnm Søren \binits S., \bauthor \bsnm Binswanger, \bfnm Klemens \binits K., \bauthor \bsnm Højgaard, \bfnm Bjarne \binits B. \betal et al. ( \byear 2000). \btitle Rare events simulation for heavy-tailed distributions. \bjournal Bernoulli \bvolume 6 \bpages 303–322. \endbibitem
7Chan and Kroese (2011) {barticle} [author] \bauthor \bsnm Chan, \bfnm Joshua CC \binits J. C. and \bauthor \bsnm Kroese, \bfnm Dirk P \binits D. P. ( \byear 2011). \btitle Rare-event probability estimation with conditional Monte Carlo. \bjournal Annals of Operations Research \bvolume 189 \bpages 43–61. \endbibitem
8Cont (2001) {barticle} [author] \bauthor \bsnm Cont, \bfnm Rama \binits R. ( \byear 2001). \btitle Empirical properties of asset returns: stylized facts and statistical issues. \bjournal Quantitative Finance \bvolume 1 \bpages 223-236. \endbibitem

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Large Deviations of Factor Models with Regularly-Varying Tails: Asymptotics and Efficient Estimation

Abstract

keywords:

1 Introduction

2 Gaussian Factor Model

Definition 2.1**.**

Theorem 2.2**.**

3 Regularly-Varying Factors

Definition 3.1**.**

Condition 3.2**.**

Example 3.3**.**

Notation 3.4** (Asymptotic equivalence).**

Definition 3.5** (Regularly-Varying (RV) distribution).**

Claim 3.6**.**

Remark 3.7**.**

Theorem 3.8**.**

Definition 3.9**.**

Theorem 3.10**.**

Remark 3.11**.**

3.1 Conditional Monte-Carlo Algorithm

Remark 3.12**.**

Theorem 3.13**.**

Proof of Theorem 3.13.

3.2 CMC Concentration and Efficiency Analysis

Definition 3.14** (Van Der Vaart and Wellner (1996)).**

Remark 3.15**.**

Theorem 3.16**.**

3.3 Importance Sampling Algorithm

4 Market Portfolio Large Deviation Probability

Proposition 4.1**.**

5 Simulations

Remark 5.1**.**

5.1 Variable Deviation Bound

5.2 Examining The Catastrophe Principle

6 Conclusion

Appendix A Proofs

A.1 Proof of Theorem 3.8

Lemma A.1**.**

.

.

A.2 Proof of Theorem 3.10

.

A.3 Proof of Theorem 3.16

.

Lemma A.2**.**

Proof.

A.4 Proof of Proposition 4.1

.

Appendix B Algorithm Complexity

Appendix C Simulations

C.1 Simulation 5.1

C.2 Simulation 5.2

C.3 Estimating r

Definition 2.1.

Theorem 2.2.

Definition 3.1.

Condition 3.2.

Example 3.3.

Notation 3.4 (Asymptotic equivalence).

Definition 3.5 (Regularly-Varying (RV) distribution).

Claim 3.6.

Remark 3.7.

Theorem 3.8.

Definition 3.9.

Theorem 3.10.

Remark 3.11.

Remark 3.12.

Theorem 3.13.

Definition 3.14 (Van Der Vaart and Wellner (1996)).

Remark 3.15.

Theorem 3.16.

Proposition 4.1.

Remark 5.1.

Lemma A.1.

Lemma A.2.