GARCH density and functional forecasts

Karim M. Abadir; Alessandra Luati; Paolo Paruolo

arXiv:1812.08409·math.ST·March 5, 2021

GARCH density and functional forecasts

Karim M. Abadir, Alessandra Luati, Paolo Paruolo

PDF

Open Access

TL;DR

This paper derives an explicit analytic form for the $h$-step ahead prediction density of a GARCH(1,1) process with Gaussian innovations, enabling exact risk measure calculations and analysis of non-Gaussianity in financial returns.

Contribution

It provides the first explicit formula for the GARCH prediction density and demonstrates its applications in risk measurement and distribution analysis, surpassing Monte Carlo methods.

Findings

01

Deviations from Gaussian distribution are sometimes significant.

02

Exact tail probabilities improve risk assessment accuracy.

03

Uncertainty regions reflect in-sample estimation uncertainty.

Abstract

This paper derives the analytic form of the $h$ -step ahead prediction density of a GARCH(1,1) process under Gaussian innovations, with a possibly asymmetric news impact curve. The contributions of the paper consists both in the derivation of the analytic form of the density, and in its application to a number of econometric problems. A first application of the explicit formulae is to characterize the degree of non-Gaussianity of the prediction distribution; for some values encountered in applications, deviations of the prediction distribution from the Gaussian are found to be small, and sometimes not. the Gaussian density as an approximation of the true prediction density. A second application of the formulae is to compute exact tail probabilities and functionals, such as the Value at Risk and the Expected Shortfall, that measure risk when the underlying asset return is generated by a…

Figures6

Click any figure to enlarge with its caption.

Tables5

Table 1. Table 1: Values of Q 2 , p subscript 𝑄 2 𝑝 Q_{2,p} for ω = 1.14 ⋅ 10 − 5 𝜔 ⋅ 1.14 superscript 10 5 \omega=1.14\cdot 10^{-5} , α = 0.131007 𝛼 0.131007 \alpha=0.131007 , β = 0.845708 𝛽 0.845708 \beta=0.845708 , λ = 0 𝜆 0 \lambda=0 using iterations ( 5.1 ).

$p$	exact $Q_{2, p}$	# of iterations	Gaussian	ratio=Gaussian/exact
0.05	$1.6415$	3	$1.6449$	1.0020
0.025	$1.9635$	3	$1.9600$	0.9982
0.01	$2.3443$	3	$2.3263$	0.9924
0.005	$2.6092$	4	$2.5758$	0.9872

Table 2. Table 2: Values of E S 2 , p 𝐸 subscript 𝑆 2 𝑝 ES_{2,p} for ω = 1.14 ⋅ 10 − 5 𝜔 ⋅ 1.14 superscript 10 5 \omega=1.14\cdot 10^{-5} , α = 0.131007 𝛼 0.131007 \alpha=0.131007 , β = 0.845708 𝛽 0.845708 \beta=0.845708 , λ = 0 𝜆 0 \lambda=0 using ( 5.2 ) and numerical integration.

$p$	exact $E S_{2, p}$	Gaussian	ratio=Gaussian/exact
0.05	2.0745	2.0627	0.9943
0.025	2.3620	2.3378	0.9898
0.01	2.7121	2.6652	0.9827
0.005	2.9612	2.8919	0.9766

Table 3. Table 3: Number of replications n 𝑛 n for a MC confidence interval on Q 2 , p subscript 𝑄 2 𝑝 Q_{2,p} , see eq. ( 5.3 ), at given MC coverage level 1 − η 1 𝜂 1-\eta .

	$p = 0.05$	0.025	0.01	0.005
$η = 0.05$	7.0710E+11	1.1604E+12	2.3680E+12	4.2104E+12
$η = 0.01$	1.2213E+12	2.0043E+12	4.0900E+12	7.2722E+12

Table 4. Table 4: Number of replications n 𝑛 n for a MC confidence interval on Q 2 , p subscript 𝑄 2 𝑝 Q_{2,p} , see eq. ( 5.5 ), at given MC coverage level 1 − η 1 𝜂 1-\eta .

	$p = 0.05$	0.025	0.01	0.005
$η = 0.05$	5.0484E+12	1.2813E+13	4.0575E+13	9.3989E+13
$η = 0.01$	8.7196E+12	2.2131E+13	7.0081E+13	1.6234E+14

Table 5. Table 5: Estimation-uncertainty regions of Q 2 , 0.01 subscript 𝑄 2 0.01 Q_{2,0.01} and E S 2 , 0.01 𝐸 subscript 𝑆 2 0.01 ES_{2,0.01} . Grid over 200 points, half of which derived from points on the surface of the estimation confidence ellipsoid.

	interval for $Q_{2, 0.01}$		interval for $E S_{2, 0.01}$		number of extremes
	min	max	min	max	derived from surface points
Microsoft stock returns	$- 2.3383$	$- 2.3298$	2.6733	2.6957	4 out of 4
One simulation run	$- 2.4197$	$- 2.3263$	2.6652	2.9144	2 out of 4

Equations158

x_{t} = σ_{t} ε_{t}, σ_{t}^{2} = ω + α_{t - 1} x_{t - 1}^{2} + β σ_{t - 1}^{2}, α_{t} := α + λ 1_{x_{t} < 0} = α + \frac{λ}{2} (1 - ς_{t})

x_{t} = σ_{t} ε_{t}, σ_{t}^{2} = ω + α_{t - 1} x_{t - 1}^{2} + β σ_{t - 1}^{2}, α_{t} := α + λ 1_{x_{t} < 0} = α + \frac{λ}{2} (1 - ς_{t})

f_{x_{t}} (u) = f_{z_{t}} (u^{2}) ∣ u ∣ .

f_{x_{t}} (u) = f_{z_{t}} (u^{2}) ∣ u ∣ .

f_{z, z_{h} ∣ ς} (w, w_{h} ∣ s) = t = 1 \prod h (w_{t} σ_{t}^{2})^{- \frac{1}{2}} g (\frac{w _{t}}{σ _{t}^{2}})

f_{z, z_{h} ∣ ς} (w, w_{h} ∣ s) = t = 1 \prod h (w_{t} σ_{t}^{2})^{- \frac{1}{2}} g (\frac{w _{t}}{σ _{t}^{2}})

f_{z_{h} ∣ ς} (w_{h} ∣ s) = \int_{R_{+}^{h - 1}} f_{z, z_{h} ∣ ς} (w, w_{h} ∣ s) d w .

f_{z_{h} ∣ ς} (w_{h} ∣ s) = \int_{R_{+}^{h - 1}} f_{z, z_{h} ∣ ς} (w, w_{h} ∣ s) d w .

f_{z_{h}} (w_{h}) = s \sum f_{z_{h} ∣ ς} (w_{h} ∣ s) Pr (s) = 2^{- h + 1} s \sum f_{z_{h} ∣ ς} (w_{h} ∣ s)

f_{z_{h}} (w_{h}) = s \sum f_{z_{h} ∣ ς} (w_{h} ∣ s) Pr (s) = 2^{- h + 1} s \sum f_{z_{h} ∣ ς} (w_{h} ∣ s)

σ_{h}^{2} = ω + (1 + y_{h - 1}) β σ_{h - 1}^{2} y_{h} := \frac{α _{h}}{β} ε_{h}^{2} .

σ_{h}^{2} = ω + (1 + y_{h - 1}) β σ_{h - 1}^{2} y_{h} := \frac{α _{h}}{β} ε_{h}^{2} .

σ_{h}^{2} = ω + (1 + y_{h - 1}) {ω β + (1 + y_{h - 2}) (\dots (ω β^{h - 2} + (1 + y_{1}) β^{h - 1} σ_{1}^{2}))}

σ_{h}^{2} = ω + (1 + y_{h - 1}) {ω β + (1 + y_{h - 2}) (\dots (ω β^{h - 2} + (1 + y_{1}) β^{h - 1} σ_{1}^{2}))}

f_{z_{h}|\bm{\varsigma}}(w_{h}|\bm{s})=\left(\frac{\gamma_{h}}{w_{h}}\right)^{\frac{1}{2}}\int_{\mathbb{R}_{+}^{h-1}}\prod_{t=1}^{h-1}\left(v_{t}^{-\frac{1}{2}}g\left(\frac{\beta}{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}a}_{t}}v_{t}\right)\right)\cdot\sigma_{h}^{-1}g\left(\frac{w_{h}}{\sigma_{h}^{2}}\right)\mathrm{d}\bm{v},

f_{z_{h}|\bm{\varsigma}}(w_{h}|\bm{s})=\left(\frac{\gamma_{h}}{w_{h}}\right)^{\frac{1}{2}}\int_{\mathbb{R}_{+}^{h-1}}\prod_{t=1}^{h-1}\left(v_{t}^{-\frac{1}{2}}g\left(\frac{\beta}{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}a}_{t}}v_{t}\right)\right)\cdot\sigma_{h}^{-1}g\left(\frac{w_{h}}{\sigma_{h}^{2}}\right)\mathrm{d}\bm{v},

Ψ (a; c; z) = \frac{1}{Γ ( a )} \int_{R_{+}} exp (- z t) t^{a - 1} (1 + t)^{c - a - 1} d t

Ψ (a; c; z) = \frac{1}{Γ ( a )} \int_{R_{+}} exp (- z t) t^{a - 1} (1 + t)^{c - a - 1} d t

A_{h} (r)

A_{h} (r)

\cdot t = 1 \prod h - 2 Ψ (\frac{1}{2}, r - K_{t} + \frac{3}{2}; \frac{β}{2 α _{h - t}}) Ψ (\frac{1}{2}, r - K_{h - 2} + \frac{3}{2}; \frac{ω + β σ _{1}^{2}}{2 α _{1} σ _{1}^{2}}),

f_{z_{h}} (w_{h})

f_{z_{h}} (w_{h})

f_{x_{h}} (u_{h})

ω (1 - i = 1 \sum j - 1 β^{i}) \leq β^{j} σ_{1}^{2}, j \geq 2

ω (1 - i = 1 \sum j - 1 β^{i}) \leq β^{j} σ_{1}^{2}, j \geq 2

F_{z_{h}} (w_{h})

F_{z_{h}} (w_{h})

F_{x_{h}} (u_{h})

E (x_{h}^{2 m}) = E (z_{h}^{m}) = 2^{m - \frac{3}{2} (h - 1)} π^{- \frac{h}{2}} Γ (m + \frac{1}{2}) (ω + β σ_{1}^{2})^{m + \frac{1}{2}} s \in S \sum γ_{h}^{\frac{1}{2}} A_{h} (m) m = 1, 2, \dots

E (x_{h}^{2 m}) = E (z_{h}^{m}) = 2^{m - \frac{3}{2} (h - 1)} π^{- \frac{h}{2}} Γ (m + \frac{1}{2}) (ω + β σ_{1}^{2})^{m + \frac{1}{2}} s \in S \sum γ_{h}^{\frac{1}{2}} A_{h} (m) m = 1, 2, \dots

Ψ (\frac{1}{2}; \frac{3}{2} + m - k; ξ)

Ψ (\frac{1}{2}; \frac{3}{2} + m - k; ξ)

= \frac{Γ ( m + \frac{1}{2} )}{π k ! ( k m - \frac{1}{2} )} ξ^{- \frac{1}{2} - m + k} j = 0 \sum m - k \frac{( j m - k )}{( j - \frac{1}{2} + m - k )} \frac{ξ ^{j}}{j !}

f_{z_{h}} (w_{h})

f_{z_{h}} (w_{h})

f_{x_{h}} (u_{h})

c_{j, s}^{⋆} := π^{\frac{1}{2}} (σ_{1}^{2} α_{1})^{- \frac{1}{2}} (\frac{1}{2})_{j} Ψ (j + \frac{1}{2}, 1; \frac{ω + β σ _{1}^{2}}{2 α _{1} σ _{1}^{2}}),

c_{j, s}^{⋆} := π^{\frac{1}{2}} (σ_{1}^{2} α_{1})^{- \frac{1}{2}} (\frac{1}{2})_{j} Ψ (j + \frac{1}{2}, 1; \frac{ω + β σ _{1}^{2}}{2 α _{1} σ _{1}^{2}}),

c_{j, s}^{⋆} := π^{\frac{h - 1}{2}} (σ_{1}^{2})^{- \frac{1}{2}} \cdot

c_{j, s}^{⋆} := π^{\frac{h - 1}{2}} (σ_{1}^{2})^{- \frac{1}{2}} \cdot

\cdot k_{1} = 0 \sum \infty k_{2} = 0 \sum \infty \dots k_{h - 2} = 0 \sum \infty (k _{1} - \frac{1}{2}) α_{h - 1}^{- \frac{1}{2}} Ψ (\frac{1}{2}, j + 1 - k_{1}; \frac{β}{2 α _{h - 1}}) \cdot β^{j (h - 2) - U_{h - 2}} \cdot (\frac{ω + β σ _{1}^{2}}{ω})^{j - K_{h - 2}} \cdot

\cdot t = 2 \prod h - 2 α_{h - t}^{- \frac{1}{2}} (k _{t} j - \frac{1}{2} - K _{t}) Ψ (\frac{1}{2}, j + 1 - K_{t}; \frac{β}{2 α _{h - t}}) α_{1}^{- \frac{1}{2}} Ψ (\frac{1}{2}, j + 1 - K_{h - 2}; \frac{ω + β σ _{1}^{2}}{2 α _{1} σ _{1}^{2}})

1 = E ((α ε_{t}^{2} + β)^{κ}) = β^{κ} i = 0 \sum κ (i κ) (\frac{α}{β})^{i} E (ε_{t}^{2 i}),

1 = E ((α ε_{t}^{2} + β)^{κ}) = β^{κ} i = 0 \sum κ (i κ) (\frac{α}{β})^{i} E (ε_{t}^{2 i}),

f_{x_{2}} (u_{2}) = \frac{1}{2 π σ ~ _{2}} j = 0 \sum \infty \frac{1}{j !} (- \frac{1}{2} \frac{u _{2}^{2}}{σ ~ _{2}^{2}})^{j} z Ψ (\frac{1}{2}; 1 - j; z)

f_{x_{2}} (u_{2}) = \frac{1}{2 π σ ~ _{2}} j = 0 \sum \infty \frac{1}{j !} (- \frac{1}{2} \frac{u _{2}^{2}}{σ ~ _{2}^{2}})^{j} z Ψ (\frac{1}{2}; 1 - j; z)

z := \frac{ω + β σ _{1}^{2}}{2 α σ _{1}^{2}} = \frac{σ ~ _{2}^{2}}{2 α σ _{1}^{2}} = \frac{1}{2} (\frac{ω}{β σ _{1}^{2}} + 1) \frac{β}{α} .

z := \frac{ω + β σ _{1}^{2}}{2 α σ _{1}^{2}} = \frac{σ ~ _{2}^{2}}{2 α σ _{1}^{2}} = \frac{1}{2} (\frac{ω}{β σ _{1}^{2}} + 1) \frac{β}{α} .

u_{n + 1} = u_{n} - \frac{F _{x_{h}} ( u _{n} ) - p}{f _{x_{h}} ( u _{n} )} .

u_{n + 1} = u_{n} - \frac{F _{x_{h}} ( u _{n} ) - p}{f _{x_{h}} ( u _{n} )} .

E S_{h, p}

E S_{h, p}

= Q_{h, p} + \frac{1}{p} \int_{- \infty}^{- Q_{h, p}} F_{x_{h}} (u) d u

0 \leq x \to - \infty lim (- x F (x)) = x \to - \infty lim (- \int_{- \infty}^{x} x f (u) d u) \leq x \to - \infty lim (- \int_{- \infty}^{x} u f (u) d u) = 0.

0 \leq x \to - \infty lim (- x F (x)) = x \to - \infty lim (- \int_{- \infty}^{x} x f (u) d u) \leq x \to - \infty lim (- \int_{- \infty}^{x} u f (u) d u) = 0.

n (q_{⌊ n p ⌋ + 1} + Q_{h, p}) \to w N (0, \frac{p ( 1 - p )}{f _{x_{h}}^{2} ( Q _{h, p} )}),

n (q_{⌊ n p ⌋ + 1} + Q_{h, p}) \to w N (0, \frac{p ( 1 - p )}{f _{x_{h}}^{2} ( Q _{h, p} )}),

n \geq ⌈ \frac{4 z _{1 - η /2}^{2} p ( 1 - p ) 1 0 ^{2 a}}{f _{x_{h}}^{2} ( Q _{h, p} )} ⌉ .

n \geq ⌈ \frac{4 z _{1 - η /2}^{2} p ( 1 - p ) 1 0 ^{2 a}}{f _{x_{h}}^{2} ( Q _{h, p} )} ⌉ .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMarket Dynamics and Volatility · Monetary Policy and Economic Impact · Financial Risk and Volatility Modeling

Full text

GARCH density and functional forecasts111Information and views set out in this paper are those of the authors and do not necessarily reflect the ones of the institutions of affiliation.

The authors acknowledge useful comments from Enrique Sentana, Robert Engle, Eric Renault, Leopoldo Catania, Barbara Rossi, Dimitris Politis, Ngai Chan, Christian Brownlees, Nour Meddahi, Torben Andersen, two anonymous referees, as well as from participants to seminars in University of Verona and Bocconi University of Milan, the Seventh Italian Congress of Econometrics and Empirical Economics 2017, EC2 2017 in Amsterdam, Barcellona GSE Summer Forum 2017 and the NBER-NSF Time Series Conference 2018.

Karim M. Abadir222Email: [email protected], ORCID: 0000-0001-5637-9513.

Alessandra Luati333Email: [email protected], ORCID: 0000-0001-6407-9385.

Paolo Paruolo444Email: [email protected], ORCID: 0000-0002-3982-4889,

Corresponding author.

Address: European Commission, Joint Research Centre (JRC), Via Enrico Fermi 2749,

TP 723, 21027 Ispra (VA), Italy

Business School, Imperial College London, UK

Department of Statistical Sciences “Paolo Fortunati”, University of Bologna, Italy

European Commission, Joint Research Centre (JRC), Ispra (VA), Italy

Abstract

This paper derives the analytic form of the $h$ -step ahead prediction density of a GARCH(1,1) process under Gaussian innovations, with a possibly asymmetric news impact curve. The contributions of the paper consists both in the derivation of the analytic form of the density, and in its application to a number of econometric problems. A first application of the explicit formulae is to characterize the degree of non-Gaussianity of the prediction distribution; for some values encountered in applications, deviations of the prediction distribution from the Gaussian are found to be small, and sometimes not. the Gaussian density as an approximation of the true prediction density. A second application of the formulae is to compute exact tail probabilities and functionals, such as the Value at Risk and the Expected Shortfall, that measure risk when the underlying asset return is generated by a Gaussian GARCH(1,1). This improves on existing methods based on Monte Carlo simulations and (non-parametric) estimation techniques, because the present exact formulae are free of Monte Carlo estimation uncertainty. A third application is the definition of uncertainty regions for functionals of the prediction distribution that reflect in-sample estimation uncertainty. These applications are illustrated on selected empirical examples.

keywords:

GARCH(1,1), Prediction density, Functionals, Value at Risk, Expected Shortfall

JEL:

C22 , C53 , C58 , G17 , D81

1 Introduction

Since their introduction in Engle (1982) and Bollerslev (1986), Generalised AutoRegressive Conditional Heteroskedasticity (GARCH) processes have been widely employed in financial econometrics, see e.g. Bollerslev et al. (2010). In the GARCH original formulation, the conditional distribution of innovations was typically assumed to be Gaussian; even with Gaussian innovations, GARCH processes were shown to generate volatility clustering and, when stationary, an unconditional distribution with fatter tails than the Gaussian, see e.g. Bollerslev et al. (1992).

GARCH processes can include several lags $q$ of the past squared shocks and several lags $p$ of the past volatility; in practice, however, the GARCH(1,1) model with $p=q=1$ is often found to offer a good fit for asset returns, and it is usually preferred to GARCH models with more parameters, see Tsay (2010) section 3.5, or Andersen et al. (2006), section 3.6. Moreover, many multivariate GARCH models are built on the univariate GARCH(1,1), see e.g. Engle et al. (2019) and references therein. In this sense the GARCH(1,1) is both the prototype and the workhorse of GARCH processes in practice.

GARCH processes map news into the conditional volatility; the function obtained by replacing past conditional volatilities with unconditional ones was called by Engle and Ng (1993) the news-impact-curve. For GARCH(1,1) processes, this curve yields the same value of volatility for positive and negative shocks, i.e. it is symmetric. Glosten et al. (1993) (henceforth GJR) extended the GARCH setup to allow for asymmetric news impact curve responses to negative shocks.

Many measures of risk are functions of the prediction distribution of asset returns. These measures include the Value at Risk, which is a quantile of the prediction distribution of the asset return, see Jorion (2006), as well as the Expected Shortfall, see Patton et al. (2019) and Arvanitis et al. (2019). The latter is the expected value of the prediction distribution of the asset return in the left tail of the prediction density below the Value at Risk; this measure has been recently re-emphasised by the Third Basel Accords. Both measures are functionals of the prediction distribution of asset returns.

The prediction distribution of a GARCH(1,1) hence plays an important role for the computation of risk measures in financial applications. This distribution is not known in analytic form beyond the distribution of innovations for the 1-step ahead case, which is given by assumption when building the process, see e.g. Andersen et al. (2006), page 811 and Baillie and Bollerslev (1992).

The present paper derives the analytical form of the $h$ -step-ahead prediction density of a Gaussian GARCH(1,1), $x_{t}=\sigma_{t}\varepsilon_{t}$ , with $\sigma_{t}^{2}=\omega+\alpha x_{t-1}^{2}+\beta\sigma_{t-1}^{2}$ , also allowing for GJR-GARCH(1,1) with asymmetric news-impact-curve. The first contribution of the paper is theoretic, and consists in the form of the p.d.f. and c.d.f. of the prediction distribution. The results are obtained by marginalizing the joint density of the prediction observations, using integration and special functions, for any prediction horizon $h=1,2,\dots$ .The formulae are valid for stationary as well as non-stationary GARCH(1,1) processes.

The 2-step-ahead prediction distribution is obtained without imposing constrains on the values of the $\alpha$ and $\beta$ coefficients. For the $h$ -step-ahead prediction distribution with $h\geq 3$ , a condition on $\beta$ is required to guarantee integrability, which depends on how the coefficients are related to the last observed value of the volatility $\sigma_{1}^{2}$ . Unless the last observed value of volatility $\sigma_{1}^{2}$ is relatively low, a sufficient condition for integrability is $\beta$ larger than $0.5$ . Another sufficient condition that is independent of the last observed value of volatility $\sigma_{1}^{2}$ is $\beta$ larger than $0.61803$ .

As suggested by one referee, one could wonder how frequently the conditions $\beta\geq 0.5$ and $\beta\geq 0.61803$ are satisfied by empirical estimates in practice. While this is ultimately an empirical question that depends on the data at hand, one could consider the estimates in Bampinas et al. (2018), who fit GARCH(1,1) models to all stock returns in the S&P 1500 universe, from January 2008 to December 2011, with daily observations. Their Table 1 reports a mean value of $\hat{\beta}$ of $0.855$ (median $0.887$ ) with standard deviation of $0.154$ . If the empirical distribution of these estimates were Gaussian, the predicted frequency of $\beta$ estimates below $0.61803$ (respectively $0.5$ ) would be 6.2% (respectively 1.1%). Typical values for $\beta$ estimates reported in textbooks and in empirical finance literature are also above $0.8$ . On this basis, one would expect $\beta\geq 0.5$ or $\beta\geq 0.61803$ not to be binding restrictions in most practical applications.555Table 1 in Bampinas et al. (2018) reports a skewness of $-2.670$ and kurtosis of $12.75$ , which indicate that the distribution is far from normal; unfortunately, no min or max values are reported in the table. Typical $\hat{\beta}$ values in textbooks and in empirical finance literature can be found in Linton (2019) Table 11.4, Tsay (2010) page 136, Francq and Zakoian (2010) page 262, Engle et al. (2008) page 694; they are all greater than $0.8$ for daily, weekly and monthly returns.

The assumption of Gaussianity of the innovations is central in the derivations in the paper, which employs analytical integration and special functions that are specific for this distribution. The approach in the derivation of the prediction distribution is expected to be amenable to extensions to non-Gaussian symmetric distributions of innovations. These extensions are not trivial and are not considered in the present paper.

A first application of the present results is the characterization of the degree of non-Gaussianity of the prediction distribution. This problem is relevant in practice, because the Gaussian density is often used as an approximation of the true prediction density, see e.g. Lau and McSharry (2010), page 1322. The exact formuale in this paper demonstrate that the prediction density can be very far from normal. However, for some parameter values encountered in applications, the discrepancy of the prediction density from the Gaussian density can be small, and this fact would support to the Gaussian approximation. Note that it is the analytic form of the prediction density derived in this paper that allows to measure this discrepancy.

A different question in this first area of application is the association of the coefficients of the GARCH with both the shape of the prediction distribution and the shape of the stationary distribution, when this exists. Under stationarity, the predictive distribution converges to the stationary distribution as the number of steps increases. The tail behavior of the stationary distribution of the Gaussian GARCH(1,1) has been studied extensively, see Mikosch and Starica (2000) and Davis and Mikosch (2009). The tails of the stationary distribution of both the volatility and of the GARCH process $x_{t}$ are of Pareto type, $\Pr(x_{t}>u)\approx cu^{-2\kappa}$ say. These properties are based on results for random difference equations and renewal theory obtained in Kesten (1973) and Goldie (1991).

The prediction density is found to resemble a Gaussian density (with appropriate variance) for high values of $\beta/\alpha$ , and far from it for low values of it. Similarly, large values of $\beta/\alpha$ are found to be associated with higher values of $\kappa$ , i.e. a Pareto stationary distribution with more moments (the Gaussian has all moments).

A second application of the explicit formulae in this paper is to compute exact tail probabilities and functionals, such as Expected Shortfall, that measure risk when the underlying asset return is generated by a Gaussian GARCH(1,1). This improves on existing methods based on approximations or Monte Carlo (MC) simulations combined with (non-parametric) estimation techniques.

The so-far unknown analytic form of the prediction density of a GARCH has led econometricians to look for alternative approximate solutions. Alexander et al. (2013) have resorted to approximations based on the first 4 moments of the prediction distribution; Baillie and Bollerslev (1992) use a Cornish-Fisher expansion and a Johnson SU distribution using the first 4 moments of the GARCH(1,1) to fit the distributions.

Alternative methods for estimating the prediction density and risk measures such as the Value at Risk and the Expected Shortfall rely on MC simulations of the underlying GARCH processes, see e.g. Delaigle et al. (2016). All these MC methods implicitly assume that the density and that the unknown functionals are finite.666Delaigle et al. (2016) proposed a non-parametric root- $n$ consistent estimator of the stationary distribution of the (log-)volatility process where $n$ is sample size.

In this domain, the present exact formulae are key to prove that the density and the unknown functionals are finite, which is pre-requisite for MC methods to work. It must be noted, however, that MC methods have an additional layer of uncertainty – associated with MC estimation – that the exact methods proposed in this paper bypass entirely. Specifically, MC estimation of prediction functionals results in confidence intervals with positive length and coverage probability $1-\eta<1$ ; their counterpart based on the exact results of this paper can be represented by confidence intervals with length equal to 0 and coverage probability 1. Hence the exact methods in this paper are qualitatively superior to those based on MC methods, in addition to being much more parsimonious in terms of required calculations.

A third application of the exact formulae in this paper is to provide uncertainty intervals for (functionals of) the prediction distribution that reflect in-sample estimation uncertainty. This allows one to map estimation uncertainty onto forecast uncertainty for the risk measures and to construct the associated forecast intervals that have a pre-specified (asymptotic) coverage level. For instance, one could predict the Expected Shortfall to lie in an interval $(ES_{l},ES_{u})$ with 95% confidence level, where the uncertainty reflects in-sample estimation uncertainty on the GARCH parameters. As already discussed for the second area of application, the exact methods in this paper lead to results that are structurally different from the ones based on MC methods. In fact the latter involve an additional layer of MC estimation uncertainty associated, which is avoided by the exact methods in the present paper.

The rest of the paper is organised as follows. Section 2 describes the general approach for the derivation of the prediction distribution. Section 3 states the main theoretical results. Section 4 discusses the degree of non-Gaussianity of the prediction distribution and compares the prediction distribution with the tails of the stationary distribution when this exists; this is the first area of application discussed above. Section 5 discusses the second area of application, i.e. how to apply the present exact results to the calculation of the Value at Risk and of the Expected Shortfall, and compares the obtained results with alternative estimators based on MC methods. Section 6 discusses the third area of application of the present analytical formulae to the construction of forecast intervals for risk measures that reflect in-sample estimation uncertainty. Section 7 concludes. Proofs are collected in the Appendix.

2 The prediction density

This section summarises the construction used to derive the prediction density as an integral, involving a product of densities of the innovations. Consider the asymmetric GJR-GARCH(1,1)

[TABLE]

where $\omega,\alpha,\beta>0$ , $\lambda\geq 0$ and $1_{x_{t}<0}=\frac{1}{2}(1-\varsigma_{t})$ is the indicator function for the event $x_{t}<0$ , and $\varsigma_{t}:=\operatorname{sgn}(\varepsilon_{t})=\operatorname{sgn}(x_{t})$ is the sign of $\varepsilon_{t}$ or $x_{t}$ ; these signs are the same because $\sigma_{t}>0$ . The sequence $\{\varepsilon_{t}\}$ is assumed to be i.i.d., centered around zero and with Gaussian p.d.f. $f_{\varepsilon}(\epsilon):=g(\epsilon^{2}):=(2\pi)^{-\frac{1}{2}}\exp(-\epsilon^{2}/2)$ .

Time $t=0$ is taken to be the starting time of the prediction, and it is assumed that one wishes to predict $x_{h}$ for some $h=1,2,3,\dots$ , conditional on the information set at time $t=0$ , which contains $x_{0}$ and $\sigma_{0}$ ; the conditioning on $x_{0}$ and $\sigma_{0}$ is not explicitly included in the notation of the prediction distribution for simplicity. 777The information set at $t=0$ containing contains $x_{0}$ and $\sigma_{0}$ is consistent with observing $x_{t}$ from minus infinity to time 0 under stationarity. Note also that, because $x_{0}$ and $\sigma_{0}^{2}$ are observed, also $\sigma_{1}^{2}$ is observed.

Throughout the paper the values taken by the random variables $x_{t}$ , and $z_{t}:=x_{t}^{2}$ are denoted $u_{t}$ and $w_{t}$ respectively, or sometimes $u$ and $w$ for simplicity, when this may not cause ambiguity.

The next Lemma reports consequences of the symmetry of the one-step-ahead density $g$ on relevant conditional p.d.f.s. In the Lemma, the following notation is used: $\bm{z}:=(z_{1},\dots,z_{h-1})^{\prime}$ , $\bm{\varsigma}:=(\varsigma_{1},\dots,\varsigma_{h-1})^{\prime}$ , where $\bm{w}:=(w_{1},\dots,w_{h-1})^{\prime}$ , $\bm{s}:=(s_{1},\dots,s_{h-1})^{\prime}$ denote values of $\bm{z}$ and $\bm{\varsigma}$ . The density of a random variable $x$ evaluated at $u$ is indicated as $f_{x}(u)$ , and similarly $f_{x|\varsigma}(u|s)$ indicates the conditional density for $x$ given $\varsigma$ , evaluated at $x=u$ and $\varsigma=s$ .

Lemma 2.1 (Densities).

For symmetric $f_{\varepsilon}(\epsilon)=g(\epsilon^{2})$ , the p.d.f. $f_{x_{t}}(\cdot)$ is symmetric, i.e. $f_{x_{t}}(u)=f_{x_{t}}(-u)$ , $u\in\mathbb{R}$ , and it is related to the p.d.f. of $z_{t}$ in the following way

[TABLE]

Moreover, $\mathrm{\Pr}(\varsigma_{t}=\pm 1)=\frac{1}{2}$ and one has

[TABLE]

where $\sigma^{2}_{t}$ depends on $w_{t-j}$ $($ the value of $z_{t-j}=x_{t-j}^{2}$$)$ and $s_{t-j}$ $($ the sign of $x_{t-j}$$)$ for $j=1,\dots,t-1$ via (2.1).

Next denote the set of all possible $h-1$ sign vectors $\bm{\varsigma}$ by $\mathcal{S}$ , $\#\mathcal{S}=2^{h-1}$ . Densities are first computed conditionally on $\bm{\varsigma}$ and later they are marginalized with respect to it. Here, conditioning on $\bm{\varsigma}$ is relevant only for the GJR case $\lambda\neq 0$ .

The basic building block is given by the expression in (2.3). This density can be marginalised with respect to $\bm{z}$ as follows

[TABLE]

Finally, $f_{z_{h}|\bm{\varsigma}}(w_{h}|\bm{s})$ can be marginalised with respect to the signs $\bm{\varsigma}$ using the mutual independence of the signs $\varsigma_{t-j}$ and the fact that $\mathrm{\Pr}(\varsigma_{t}=\pm 1)=\frac{1}{2}$ for all $t$ , due to the symmetry of $g$ . One hence finds

[TABLE]

where the sum $\sum_{\bm{s}}$ is over $s_{j}\in\{-1,1\}$ , for $j=1,\dots,h-1$ . The prediction density $f_{z_{h}}(w_{h})$ is found by combining (2.5), (2.4), (2.3), (2.2).

The next Lemma reports a recursion for the volatility process, that turns out to be useful when solving the integral in (2.4). In the Lemma, the following notation is used: for $t=1,\dots,h-1$ , let $y_{t}:=\alpha_{t}z_{t}/(\beta\sigma_{t}^{2})=\alpha_{t}x_{t}^{2}/(\beta\sigma_{t}^{2})=\alpha_{t}\varepsilon_{t}^{2}/\beta$ and $\bm{y}:=(y_{1},\dots,y_{h-1})^{\prime}$ , where $\bm{v}:=(v_{1},\dots,v_{h-1})^{\prime}$ denotes a value of $\bm{y}$ .

Lemma 2.2 (Volatility and transformations).

The volatility process can also be written

[TABLE]

For $h\geq 2$ , $\sigma_{h}^{2}$ has the following recursive expression in terms of $y$ ’s

[TABLE]

with $\sigma_{1}^{2}=\omega+\beta\sigma_{0}^{2}+\alpha_{0}x_{0}^{2}$ , which is measurable with respect to the information set at time [math]. Moreover, one has

[TABLE]

where $\gamma_{h}:=\beta^{h-1}/(\prod_{t=1}^{h-1}{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}a}_{t})$ and $a_{t}=\alpha+\frac{1}{2}\lambda(1-s_{t})$ is the value of $\alpha_{t}$ corresponding to $\varsigma_{t}=s_{t}$ .

3 Main results

The main results are summarised in Theorem 3.4 below. Before stating it, an auxiliary assumption is introduced. Define $\theta:=\omega/2\sigma_{1}^{2}$ , and note that this is bounded by 0 and $\frac{1}{2}$ as $\alpha$ and $\beta$ vary, $0<\theta\leq\frac{1}{2}$ . Moreover define the following function of $\theta$ : $\underline{\beta}:=\underline{\beta}(\theta):=-\theta+\sqrt{\theta^{2}+2\theta}$ , which is used in the next Assumption.

Assumption 3.3.

a.

For $h=3$ , let $\beta\geq\underline{\beta}$ ;

b.

For $h>3$ let $\beta\geq\max(\frac{1}{2},\underline{\beta})$ .

It can be noted that $\sup_{\theta}\underline{\beta}(\theta)=\lim_{\theta\rightarrow\frac{1}{2}}\underline{\beta}(\theta)=\frac{-1+\sqrt{5}}{2}\approx 0.618\,03$ , as $\sigma_{1}^{2}>\omega$ . In Figure 1, the area above the curve represents the set $\beta\geq\max(\frac{1}{2},\underline{\beta})$ for $0<\theta\leq\frac{1}{2}$ .

Note that $\underline{\beta}$ depends on the ratio $\theta=\omega/(2\sigma_{1}^{2})$ , where $\sigma_{1}^{2}$ is the last known value in the information set. Relatively large values of $\sigma_{1}^{2}$ correspond to $0<\theta\leq\frac{1}{8}$ , moderate values to $\frac{1}{8}<\theta\leq\frac{1}{4}$ and small values to $\frac{1}{4}<\theta\leq\frac{1}{2}$ . Note in fact, for instance, that $\theta\leq\frac{1}{4}$ corresponds to $\omega\leq\sigma_{0}^{2}(\beta+\alpha\varepsilon_{0}^{2})=\alpha x_{0}^{2}+\beta\sigma_{0}^{2}$ , an inequality one expects to be frequently valid. Note that $\underline{\beta}(\frac{1}{8})=0.39039$ , $\underline{\beta}(\frac{1}{4})=\frac{1}{2}$ , $\underline{\beta}(\frac{1}{2})=0.61803$ , so that Assumption 3.3 requires $\beta>0.39039$ (respectively $0.5)$ when the last observed volatility is relatively large (respectively moderate) for the results in the paper for $h=3$ to hold. Only low values of the last observed volatility correspond to $\beta>\frac{1}{2}$ .

For $h>3$ , $\beta\geq\max(\frac{1}{2},\underline{\beta})$ . Hence, unless the last observed volatility $\sigma_{1}^{2}$ is very low i.e. $\frac{1}{4}<\theta\leq\frac{1}{2}$ , the sufficient condition (which could be possibly further analytically improved) requires $\beta\geq\frac{1}{2}$ .

In Theorem 3.4 below, $\Psi$ is the confluent hypergeometric function of the second kind, also known as Tricomi function, see Abadir (1999) and Gradshteyn and Ryzhik (2007), section 9.21, whose integral representation is,

[TABLE]

with $\mathop{\mathrm{Re}}\nolimits(z)>0,\mathop{\mathrm{Re}}\nolimits(a)>0$ .

The $\Psi$ function is used to define the following quantities:

[TABLE]

where $K_{0}:=0$ , $K_{i}:=\sum_{t=1}^{i}k_{i}$ , $U_{j}:=\sum_{i=1}^{j}K_{i}=\sum_{i=1}^{j}(j-i+1)k_{i}$ . The multiple sum is defined for $h\geq 3$ as $\sum_{k_{1},\dots,k_{h-2}}:=\sum_{k_{1}=0}^{r}\sum_{k_{2}=0}^{r-k_{1}}\cdots\sum_{k_{h-2}=0}^{r-K_{h-3}}$ , where the individual sums extend to $\infty$ if $r\notin\mathbb{N}$ . For $h=2$ the sum $\sum_{k_{1},\dots,k_{h-2}}$ and the product $\prod_{t=1}^{h-2}$ are empty and (3.2) reduces to $A_{2}(r)=\left(\beta\sigma_{1}^{2}\right)^{-\frac{1}{2}}\pi^{\frac{1}{2}}\Psi\left(\frac{1}{2},r+\frac{3}{2};\frac{\omega+\beta\sigma_{1}^{2}}{2\alpha_{1}\sigma_{1}^{2}}\right)$ .

Theorem 3.4 (GARCH(1,1) prediction density).

Assume that $\varepsilon_{t}$ are i.i.d. $\mathrm{N}(0,1)$ and let Assumption 3.3 hold; then one has, for $h\geq 2$ , $w_{h}\geq 0$ and $-\infty<u_{h}<\infty$

[TABLE]

where $c_{j}:=2^{-h+1}\sum_{\bm{s}\in\mathcal{S}}c_{j,\bm{s}}$ and $c_{j,\bm{s}}$ is defined as $c_{j,\bm{s}}:=\gamma_{h}^{\frac{1}{2}}A_{h}\left(-j-\frac{1}{2}\right),$ where $A_{h}(\cdot)$ is defined in (3.2) and $\gamma_{h}$ in Lemma 2.2. The expressions in (3.3) and (3.4) are absolutely summable for any finite $w_{h}$ or $u_{h}$ .

Proof.

See Appendix. ∎

Observe that the expression for $c_{j,\bm{s}}$ does not depend on the points of evaluation $w_{h}$ or $u_{h}$ , and hence the $c_{j,\bm{s}}$ coefficients can be computed only once for the whole densities. One can prove, see Lemma A.10 in the Appendix, that $\Psi(a;c;z)\rightarrow 0$ for $c\rightarrow-\infty$ . This implies that the terms $\Psi\left(\frac{1}{2},r-K_{t}+\frac{3}{2};z\right)$ converge to 0 for large $k_{t}$ in the product in (3.2).

Note also that for $h=2$ , equation (3.4) holds for any value of $\beta$ , while for $h=3$ it holds if and only if $\beta\geq\underline{\beta}$ . For $h>3$ , the validity of the (3.4) is guaranteed by the sufficient condition $\beta\geq\max(\frac{1}{2},\underline{\beta})$ , which is, however, not necessary.

The line of proof of Theorem 3.4 is the following: for $h=2$ the integral is solved by substitution and by using equation (3.1). For $h\geq 3$ , subsequent (negative) binomial expansions of expression (2.7) for $\sigma_{t}^{2}$ are required, whose validity is ensured by the inequality

[TABLE]

which is satisfied under Assumption 3.3, see Lemma A.11 in the Appendix.

Immediate consequences of Theorem 3.4 are collected in the following corollary.

Corollary 3.5 (C.d.f. and moments).

The prediction c.d.f.s of $z_{h}$ and $x_{h}$ are given by the following expressions for $h\geq 2$ , $w_{h}\geq 0$ and $-\infty<u_{h}<\infty$

[TABLE]

with [math] odd moments for $x_{h}$ and even moments

[TABLE]

*where $\gamma_{h}$ and $A_{h}(m)$ depend on $\bm{s}$ , see their definitions in Lemma 2.2 and in (3.2).

Note that $A_{h}(m)$ in the moments calculations are made of finite sums extending to $m$ , involving the Tricomi functions, which do not fall in the logarithmic case as in Theorem 3.4; see Abadir (1999) for the logarithmic case. In fact, $m-k\in\left\{0,1,\dots,m\right\}$ implies that

[TABLE]

is a finite sum, see Abadir (1999), which is proportional to the generalized Laguerre polynomial $L_{m-k}^{(-1/2-m+k)}(\xi)$ , where $L_{i}^{(a)}(\xi):=\sum_{k=0}^{i}\left(a+1+k\right)_{i-k}\left(-i\right)_{k}\frac{\xi^{k}}{k!}$ see e.g. Abramowitz and Stegun (1964) Chapter 22. For the moments of a GARCH(1,1), one can compare (3.6) with equations (34) and (35) in Baillie and Bollerslev (1992).

Some standardized densities of $x_{h}$ and the corresponding right tails are plotted in Figure 2 for $h=1,2,3,4$ . The curve $h=1$ is the standard Gaussian. Computations for Figures 2, 3 and 4 were performed in Mathematica.888When $x$ has mean 0 and standard deviation $s$ , the standardized variate is $z=x/s$ , with density $f_{z}(a)=sf_{x}(sa)$ .

Figure 3 shows the standardized prediction densities for $h=2$ and values of $\beta/\alpha$ that range from to 8.5 ( $\alpha=0.1,\beta=0.85$ ) to 1/8.5 ( $\alpha=0.85,\beta=0.1$ ). This figure shows that the deviations from the Gaussian case of the prediction density can be substantial; the prediction densities are more similar to a Gaussian when $\beta/\alpha$ is large. Figure 4 shows the tails for the GJR-GARCH(1,1) case.

The formulae in Theorems 3.4 and Corollary 3.5 are alternating in sign. While (absolutely) convergent, the associated series was found in practice to be ill-behaved numerically when $\rho_{u}:=u^{2}/(\omega+\beta\sigma_{1}^{2})$ is very large, causing the oscillations in the terms of the series to become large before decreasing in amplitude toward zero, where ‘large’ refers to the greatest floating point number handled by the computer. Note that this is can be linked to large $u$ and/or small $\omega+\beta\sigma_{1}^{2}$ . This extreme behaviour implies accumulation of numerical errors, which can lead to inaccurate calculations of the prediction density.

*Example 3.6** (Numerical accuracy).*

One such case can be obtained for the density of $x_{2}$ in formula (3.4), $h=2$ , in the following way: select $u=4$ for $\omega=0.0000114,\alpha=0.85,\beta=0.14$ , and choose $\sigma_{0}^{2}=x_{0}^{2}=\sigma^{2}:=\omega/(1-\alpha-\beta)=0.00114$ . This results in $\rho_{u}=53.\bar{3}$ for the standardized p.d.f. of $x_{2}/s$ with $s^{2}:=\omega+(\alpha+\beta)\sigma^{2}_{1}$ . In this case, the oscillations of the terms in the series increase up to $\pm 4\cdot 10^{20}$ around the $50^{th}$ term of the series, before oscillations decrease toward zero; the resulting series truncated after its first 100 terms gave the negative number $-2.9628\cdot 10^{12}$ . Calculations performed in MATLAB 2018a on an Intel i7 Windows 10 computer. As a comparison, the same script applied to $u=2$ gave $f_{x_{2}}(2)=0.03688432$

In order to address these numerical accuracy problems when $\rho_{u}=u^{2}/(\omega+\beta\sigma_{1}^{2})$ is large, the following theorem presents a different set of formulae for the prediction density. This alternative set has the advantage to allow computations in the far tails of the density, at the price of a slightly higher implementation cost.

Theorem 3.7 (Alternative formulae for the GARCH(1,1) prediction density).

Under the same assumptions of Theorem 3.4 one has for $h\geq 2$ , $w_{h}\geq 0$ and $-\infty<u_{h}<\infty$

[TABLE]

For $h=2$ , $p_{j}(\cdot)$ is defined as $p_{j}(\rho):=\rho^{j}/j!$ , $c_{j}^{\star}:=2^{-1}\sum_{\bm{s}\in\mathcal{S}}c_{j,\bm{s}}^{\star}$ and

[TABLE]

*where $(a)_{j}:=\prod_{i=1}^{j-1}(a+i)$ denotes Pochhammer’s symbol, see Abadir (1999). *

For $h\geq 3$ , $p_{j}(\rho):=(-1)^{j}L_{j}^{(-1)}(\rho)$ where $L_{j}^{(-1)}(\rho):=\sum_{k=0}^{j}\left(k\right)_{j-k}\left(-j\right)_{k}\frac{\rho^{k}}{k!}$ is a generalized Laguerre polynomial999 $L_{j}^{(a)}(x)$ is the standard notation, see e.g. Abramowitz and Stegun (1964) Chapter 22., with the convention $\left(0\right)_{0}:=1$ ; moreover $c_{j}^{\star}:=2^{-h+1}\sum_{\bm{s}\in\mathcal{S}}c_{j,\bm{s}}^{\star}$ where $c_{j,s}^{\star}$ is defined as

[TABLE]

with $K_{0}:=0$ , $K_{t}:=\sum_{i=1}^{t}k_{i}$ , $U_{j}:=\sum_{i=1}^{j}K_{i}=\sum_{i=1}^{j}(j-i+1)k_{i}$ . The expressions in (3.7), (3.8), (3.9), (3.10) are summable for any finite $w_{h}$ or $u_{h}$ .

The improved numerical performance of formula (3.8) is linked to the presence of the term $\mathrm{e}^{-\rho_{u}}$ when $\rho_{u}:=\frac{u^{2}}{2(\omega+\beta\sigma_{1}^{2})}$ is large. In fact for $u^{2}\rightarrow\infty$ the term $\mathrm{e}^{-\rho_{u}}\rightarrow 0$ , so that $\mathrm{e}^{-\rho_{u}}$ compensates the large terms of the type $\rho_{u}^{j}$ that appear in the sum for large $u^{2}$ . For $u^{2}\rightarrow 0$ , the term $\mathrm{e}^{-\rho_{u}}\rightarrow 1$ , so that $\mathrm{e}^{-\rho_{u}}$ does not influence the sum for small values of $u^{2}$ . Note, moreover, that all the terms in the series (3.7) and (3.8) are positive, so that there are no oscillations associated with different signs for the terms in the series.

*Example 3.8** (Numerical accuracy - continued).*

In the same setup of Example 3.6, formula (3.8) is numerically accurate. In fact, all the terms in the series were found to be bounded by $2\cdot 10^{-4}$ , with value of the density equal to $0.002953901$ , again using the first 100 terms of the series. Calculations were performed in the same environment as in Example 3.6. As a comparison, the same script applied to $u=2$ gave $f_{x_{2}}(2)=0.03688291$ , which agrees with formula (3.4) in Example 3.6 up to the 5th digit (discrepancy equal to $1.4017\cdot 10^{-6}$ ).

The slightly higher implementation cost of formula (3.8) is associated with the presence of the generalised Laguerre polynomial in $p_{j}(\rho)$ for $h\geq 3$ . They are finite sums and add a moderate cost in terms of computations. Similar derivations to Corollary 3.5 can be performed on (3.7), (3.8) to derive the corresponding c.d.f.s.

4 Stationary distribution

The limit representation of the random variable $x_{h}$ in the stationary case can be found in Francq and Zakoian (2010) Theorem 2.1 page 24. The tail behaviour of the limit distribution is reviewed in Mikosch and Starica (2000) and Davis and Mikosch (2009). The tails of the stationary distribution of both the volatility and of the GARCH process $x_{t}$ are of Pareto type, $\Pr(x_{t}>u)\approx cu^{-2\kappa}$ say, where $\kappa>0$ is a tail index. These properties are based on results for random difference equations and renewal theory obtained in Kesten (1973) and Goldie (1991).

The tail index of the stationary distribution depends on the coefficient $\alpha$ and $\beta$ of the GARCH(1,1) process $x_{t}$ as well as on the one-step-ahead distribution. Examples of the tail index are given in Davis and Mikosch (2009); for Gaussian innovations, $\kappa=14.1$ for $\alpha=\beta=0.1$ , while $\kappa=1$ for $\alpha=1-\beta$ .

The index $\kappa$ is the unique solution of $\operatorname{E}((\alpha\varepsilon_{t}^{2}+\beta)^{\kappa})=1$ . When $\kappa$ is an integer, the expression simplifies to

[TABLE]

see Davis and Mikosch (2009) eq. (10). Substituting the moments $\operatorname{E}(\varepsilon_{t}^{2n})$ from the $\chi^{2}$ distribution, and assigning values to $\alpha/\beta$ over a grid of pre-specified values, one can solve (4.1) for $\beta$ , and hence for $\alpha=(\alpha/\beta)\beta$ . This allows to compute (values of) the surface $\kappa(\alpha,\beta)$ . Figure 5 reports the level curves of $\kappa(\alpha,\beta)$ as a function of $\alpha$ and $\beta$ obtained in this way. The figure also reports the lines where $\beta/\alpha$ is constant. It is seen that, for large values of $\beta/\alpha$ , $\kappa$ and $\beta/\alpha$ increase roughly together. This association is not present for small values of $\beta/\alpha$ .

The relation between $\beta/\alpha$ and fat-tailedness of the prediction density for finite horizon $h$ can be illustrated using the case $h=2$ . From Theorem 3.4,

[TABLE]

where $\tilde{\sigma}_{2}^{2}=\omega+\beta\sigma_{1}^{2}$ and101010The quantity $\tilde{\sigma}_{2}^{2}:=\omega+\beta\sigma_{1}^{2}$ can be interpreted as the minimum value that $\sigma^{2}_{2}=\omega+\left(1+y_{1}\right)\beta\sigma_{1}^{2}$ can take, in the ideal case when $\alpha=0$ (thus $y_{1}=0$ ) and $\sigma_{1}^{2}$ is given, i.e. $x_{2}\sim\textrm{N}(0,\tilde{\sigma}^{2}_{2})$ .

[TABLE]

Hence when ${\beta}/{\alpha}\rightarrow\infty$ one has $z\rightarrow\infty$ with $\sqrt{z}\Psi\left(\frac{1}{2};1-j;z\right)=1+O(|z|^{-1})$ , see Abramowitz and Stegun (1964), eq. 13.1.8, so that all the Tricomi functions $\Psi_{j}$ , for varying $j$ , tend to one.111111This is unlike in the case for fixed $z$ where the sequence of $\Psi_{j}$ is decreasing to 0 for increasing $j$ . As a result, when ${\beta}/{\alpha}\rightarrow\infty$ the prediction distribution converges to a $\textrm{N}(0,\tilde{\sigma}^{2}_{2})$ .

One concludes that both for the prediction density for $h=2$ and for the stationary distribution, the fat-tailedness of the distributions is small for large values of ${\beta}/{\alpha}$ , unless $\alpha$ is very close to 0.

5 Comparing exact formulae with simulation-based methods

This section describes the application of the formulae in the previous section to the calculation of the Value at Risk and of the Expected Shortfall, comparing them with alternatives based on Monte Carlo. This comparison is made under Gaussianity and Assumption 3.3, so that the formulae in the paper can be applied. The analysis in this and the remaining sections is for generic forecast horizon $h=2,3,\dots$ , while illustrations are made for $h=2$ and $\lambda=0$ for simplicity and without loss of generality.

Let $p$ be some tail probability, such as 5%, and let $Q_{h,p}$ be the Value at Risk, defined as the (negative) of the $p$ quantile of the prediction distribution, i.e. $p=\Pr(x_{h}<-Q_{h,p})$ . Let also $ES_{h,p}$ indicate the corresponding expected shortfall, i.e. $ES_{h,p}:=-\operatorname{E}(x_{h}|x_{h}<-Q_{h,p})$ , following standard notation, see e.g. Francq and Zakoian (2015).

Observe that $ES_{h,p}$ may fail to exist when the underlying density has Cauchy tails. One implication of the exact results in Theorem 3.4 of Section 3 is that for finite $h$ the prediction of $x_{h}$ has thinner-than-Cauchy tails, and hence $ES_{h,p}$ exists; this appears to be a central issue for the application of $ES_{h,p}$ as a measure of risk.

The following subsections show first how the exact formulae can be applied in this context, and next their relative advantage over methods based on MC methods. The same advantages discussed for the quantification of the Value at Risk and the Expected Shortfall apply more generally to other functionals of the prediction distribution, as well as to the nonparametric estimation of the prediction distribution itself. For brevity, these latter cases are not discussed in this paper in detail.

The rest of the section refers to the standardized prediction distribution of $x_{2}$ when $\omega=1.14\cdot 10^{-5}$ , $\alpha=0.131007$ , $\beta=0.845708$ , $\lambda=0$ ; these values are the ML estimates on a AR(2)-GARCH(1,1) model for the weekly S&P500 stock index return from 1950-2018 reported in Table 11.4 in Linton (2019). These values of $\alpha$ and $\beta$ are very similar to the median estimates in Table 1 of Bampinas et al. (2018) for the set of individual S&P 1500 daily returns. In the calculations $\sigma_{1}^{2}$ was set equal to $\omega/(1-\alpha-\beta)$ . For these parameters, standard double precision was found to be sufficient for $h=2$ for a range of $|x|<6$ in standardized units.

5.1 Exact calculations

Both $Q_{h,p}$ and $ES_{h,p}$ can be calculated using the exact formulae in this paper. This subsection combines the use of results in Section 3 with numerical techniques to illustrate applications of these results. This approach is chosen to keep derivations as simple as possible, even when the analytical results of Section 3 could be extended to replace numerical integration.

Consider first $Q_{h,p}$ ; this can be found as the root of the function $F_{x_{h}}(u)-p$ , where $F_{x_{h}}(u)$ is given in (3.5), using root-finding algorithms like Newton’s method – see e.g. Press et al. (2007), Chapter 9 – where

[TABLE]

Here $F_{x_{h}}(u)$ is given in (3.5) and $f_{x_{h}}(u)$ is given in (3.4); this typically requires a handful of function evaluations.

Consider next $ES_{h,p}$ ; one can write

[TABLE]

where the second equality follows by integration by parts, and the third because $F_{x_{h}}(-Q_{h,p})=p$ by definition.121212Note that, whenever $\operatorname{E}(x)$ exists, one has $\lim_{x\rightarrow-\infty}(-xF(x))=0$ ; in fact for $-\infty<u<x$ one has

$0\leq\lim_{x\rightarrow-\infty}\left(-xF\left(x\right)\right)=\lim_{x\rightarrow-\infty}\left(-\int_{-\infty}^{x}xf\left(u\right)\mathrm{d}u\right)\leq\lim_{x\rightarrow-\infty}\left(-\int_{-\infty}^{x}uf\left(u\right)\mathrm{d}u\right)=0.$

This integral can be evaluated numerically using $f_{x_{h}}(u)$ in (3.4), or $F_{x_{h}}(u)$ in (3.5), employing quadrature methods (trapezoid), see Press et al. (2007), Chapters 4 and 13.

Table 1 reports values of $Q_{2,p}$ using the reference values from Table 11.4 in Linton (2019). The chosen algorithm in (5.1) was implemented in Matlab, using a tolerance value of $10^{-7}$ and avoiding to divide by $f$ when this is smaller than $10^{-14}$ . Initial values of the iterations were chosen equal to the corresponding Gaussian quantiles. Values of $F_{x_{h}}(u)$ were computed as in (3.5) and $f_{x_{h}}(u)$ as in (3.4), truncating sums at 100 terms.

Table 1 reports terminal values of the iterations, along with number of iterations and a comparison with the standard Gaussian distribution. Unsurprisingly, the Values at Risk are found to be close to the Gaussian quantiles. However, they are both smaller or larger than the Gaussian, depending on the value of $p$ . The number of iterations needed was smaller than 5.

Table 2 reports values of $ES_{2,p}$ using the standardized prediction distribution with the same parameter values as in Table 1. Numerical integration as in the last expression in (5.2) was performed using the standard function integral in Matlab with standard tolerance values; this uses global adaptive quadrature integration methods. Minus infinity was replaced in the calculations with $-6$ . Values of $F_{x_{h}}(u)$ were computed as in (3.5) and $f_{x_{h}}(u)$ as in (3.4), truncating sums at 100 terms.

Table 2 shows that the Expected Shortfall values are close to the Gaussian case, but systematically lower than them. In practice, the call to the integral function was quicker than the computation of the $Q_{2,p}$ in the Table 1.

5.2 Alternatives based on Monte Carlo

Alternative methods to compute $Q_{h,p}$ and $ES_{h,p}$ rely on MC simulations. Simple MC solutions are reviewed here for comparison with the exact methods above. In order to estimate $Q_{h,p}$ and $ES_{h,p}$ , for replication $j=1,\dots,n$ , one could generate pseudo random numbers $(\varepsilon_{t,j}^{\ast})_{t=1}^{h}$ and construct the corresponding values $(x_{t,j}^{\ast})_{t=1}^{h}$ using recursion (2.1). Let $x_{h,j}^{\ast}$ be the $j$ -th MC realization of $x_{h}$ constructed in this way, and observe that $x_{h,j}^{\ast}$ are independent realisations across repetitions $j$ from the prediction distribution.

Repeating this for $j=1,\dots,n$ , the sample $(x_{h,1}^{\ast},\dots,x_{h,n}^{\ast})$ can be formed; let $(q_{1},\dots,q_{n})$ indicate the ordered values of $(x_{h,1}^{\ast},\dots,x_{h,n}^{\ast})$ , with $q_{1},\leq\dots\leq q_{n}$ . The MC quantile $q_{\lfloor np\rfloor+1}$ can be used to estimate $-Q_{h,p}$ , where $\lfloor x\rfloor$ and $\lceil x\rceil$ indicate the round-down or round-up of $x$ to the nearest integer.131313 $\lfloor x\rfloor$ (respectively $\lceil x\rceil$ ) denotes the largest (respectively smallest) integer value less or equal (respectively greater or equal) to $x$ .

Observe here that the sample is a (pseudo) i.i.d. sample from the prediction distribution, and hence all results for i.i.d. samples apply on it. Standard results of quantiles based on the application of the central limit theorem to the MC empirical c.d.f., see e.g. Dudevicz and Mishra (1988) Theorem 7.4.21, imply that

[TABLE]

where $\overset{w}{\rightarrow}$ indicates weak convergence for $n\rightarrow\infty$ . Hence a MC large- $n$ confidence interval for $Q_{h,p}$ using $q_{\left\lfloor np\right\rfloor+1}$ at level $\eta$ is given by $q_{\left\lfloor np\right\rfloor+1}\pm z_{1-\eta/2}\sqrt{p(1-p)}/(\sqrt{n}f_{x_{h}}(q_{\left\lfloor np\right\rfloor+1}))$ where $z_{b}$ is the $b$ -quantile from the standard normal distribution. The length of the confidence interval for $Q_{h,p}$ is hence $\ell_{Q}=2z_{1-\eta/2}\sqrt{p(1-p)}/(\sqrt{n}f_{x_{h}}(Q_{h,p}))$ , which is linked to the precision of the MC estimate. Setting $\ell_{Q}\leq 10^{-a}$ for some integer $a$ , this equation can be solved for $R,$ giving

[TABLE]

Similarly, consider the MC estimation of $ES_{h,p}$ for given $Q_{h,p}$ . Assuming $Q_{h,p}$ known here simplifies derivations without altering the main discussion of MC uncertainty; see Patton et al. (2019) for the joint estimation of $Q_{h,p}$ and $ES_{h,p}$ . The Expected Shortfall could be estimated by

[TABLE]

with $v_{h,j}:=-x_{h,j}^{\ast}1(x_{h,j}^{\ast}\leq Q_{h,p})$ . Observe that this MC estimator is consistent when $ES_{h,p}$ exists, which is the case thanks to the results in Theorem 3.4.

Let further $V_{h}^{2}:=\operatorname{E}(v_{h,j}^{2})-\operatorname{E}(v_{h,j})^{2}$ where

[TABLE]

Observe that these expectations exist thanks to the results in Theorem 3.4. Further, note that $v_{h,r}/p$ has expectation $ES_{h,p}$ and variance $V_{h}^{2}/p^{2}$ , and hence $\operatorname{E}(v_{h,r})=pES_{h,p}$ .

Application of the central limit theorem, see e.g. Dudevicz and Mishra (1988) Theorem 6.3.2., to $m_{h,p}$ implies that

[TABLE]

Thus a MC large- $n$ confidence interval for $ES_{h,p}$ using $m_{h,p}$ at level $\eta$ is given by $m_{h,p}\pm z_{1-\eta/2}V_{h}/(p\sqrt{n})$ . The length (precision) of the confidence interval for $ES_{h,p}$ is hence $\ell_{ES}=2z_{1-\eta/2}V_{h}/(p\sqrt{n}).$ Setting $\ell_{ES}\leq 10^{-a}$ for some integer $a$ , this equation can be solved for $n$ , giving

[TABLE]

Values of $n$ from (5.3) are reported in Table 3 for the selected precision level $a=5$ and $h=2$ , using the values of $\alpha$ and $\beta$ from Table 1 and with reference to the standardized variate. In Table 3, $f_{x_{h}}(Q_{h,p})$ in (5.3) is computed using the exact formula (3.4).

From Table 3 one deduces that a large number of replications $n$ is required to compute a confidence interval at level $1-\eta$ for $Q_{h,p}$ for given $a$ . Note that the values of $n$ are large also because of the factor $f_{x_{h}}^{2}(Q_{h,p})$ and $p^{2}$ in the denominators of (5.3) and (5.5), respectively.

Values of $n$ from (5.5) are reported in Table 4 for the selected precision level $a=5$ and $h=2$ , using the values of $\alpha$ and $\beta$ from Table 1, and with reference to the standardized variate. In Table 4, $V_{h}^{2}$ in (5.5) is evaluated using numerical integration in (5.4) for $n=2$ with $f_{x_{h}}(\cdot)$ computed as in (5.3). Also from Table 4 one deduces that a large number of replications $R$ is required to compute a confidence interval at level $1-\eta$ for $ES_{h,p}$ .

More importantly, because of the nature of confidence intervals, there is probability $\eta$ that each of $Q_{h,p}$ or $ES_{h,p}$ does not fall within its MC confidence interval. Decreasing $\eta$ does not offer a solution to this problem, because the quantile $z_{1-\eta/2}$ of the standard normal distribution would diverge.

One hence concludes that the MC estimation of $Q_{h,p}$ or $ES_{h,p}$ is costly in terms of number of replications $n$ , and it does not guarantee any given level of numerical precision $a$ , because of the probability $\eta$ of $Q_{h,p}$ or $ES_{h,p}$ to fall outside its MC confidence interval. This is in contrast with the ease and precision of the exact formulae (3.4) and (3.5) provided in this paper.

Similar consideration apply the to direct nonparametric estimation of the prediction density.

6 Uncertainty regions for prediction functionals

This section discusses how uncertainty regions can be constructed for prediction functionals to reflect estimation uncertainty, making use of the explicit formulae in the paper.

Let $\bm{\theta}=(\omega,\alpha,\beta,\lambda)^{\prime}$ indicate the parameters of the GARCH(1,1) in eq. (2.1), and assume that the model has been estimated on a sample of data $\{x_{t}\}_{t=-T+1}^{0}$ by Quasi Maximum Likelihood (QML). Note that the estimation sample includes $T>0$ observations indexed by negative values of $t$ . Let $\widehat{\bm{\theta}}$ be the corresponding QML and $\bm{\theta}_{0}$ the (pseudo)-true values.

Under appropriate regularity conditions, see Lee and Hansen (1994), Jensen and Rahbek (2004) and Arvanitis and Louka (2017) and references therein, one has results of the type $T^{\frac{1}{2}}\bm{R}^{\prime}(\widehat{\bm{\theta}}-\bm{\theta}_{0})\overset{w}{\rightarrow}\mathrm{N}(\boldsymbol{0},\bm{\varOmega}_{\bm{R}})$ , where $\overset{w}{\rightarrow}$ indicates convergence in distribution as $T\rightarrow\infty$ , and $\bm{R}$ indicates a full-column-rank matrix with $r$ columns. This allows to construct asymptotic confidence regions of the type

[TABLE]

where $\Pr(w\leq c_{\eta})=1-\eta$ and $w\sim\chi^{2}(r)$ , and $\bm{R}$ is a full column rank matrix with $r$ columns.

This region has the property that $\Pr(\bm{R}^{\prime}\bm{\theta}_{0}\in A_{\eta})\rightarrow 1-\eta$ . Note that in (6.1) $\bm{\varOmega}_{\bm{R}}$ can be replaced by a consistent estimator. A special case of this is when $\bm{R}$ is chosen equal to the identity $\bm{I}$ ; in this case (6.1) gives the confidence ellipsoid for the unrestricted vector $\bm{\theta}$ ; this is default case in the following.

Let $\bm{\varphi}$ be a (multivariate) functional of interest, such as the $Q$ or the $ES$ , or both, which depends on $\bm{\theta}$ , $\bm{\varphi}=\bm{\varphi}(\bm{\theta})$ . Define also the set of values $B_{\eta}$ taken by the $\bm{\varphi}$ map for any value of $\bm{\theta}$ in $A_{\eta}$ , i.e.

[TABLE]

Then the following proposition shows that $B_{\eta}$ is an uncertainty region for $\bm{\varphi}$ with at least asymptotic coverage equal to $1-\eta$ .

Proposition 6.9 (Uncertainty region).

$B_{\eta}$ * is a uncertainty region for $\bm{\varphi}$ with at least asymptotic coverage equal to $1-\eta$ , i.e. $\Pr(\bm{\varphi}(\bm{\theta}_{0})\in B_{\eta})\rightarrow\gamma\geq 1-\eta$ .*

Proof.

See Fanelli and Paruolo (2010) Proposition 1. ∎

In practice, one needs to compute the set $\bm{\varphi}(A_{\eta})$ . Assume for simplicity that $\bm{\varphi}(A_{\eta})$ is univariate, indicated here as $\varphi(A_{\eta})$ . An uncertainty interval would be $(\varphi_{1},\varphi_{2})$ where $\varphi_{1}=\inf_{\bm{\theta}\in A_{\eta}}\{\varphi(\bm{\theta})\}$ and $\varphi_{2}=\sup_{\bm{\theta}\in A_{\eta}}\{\varphi(\bm{\theta})\}$ .

One way to approximate the interval $(\varphi_{1},\varphi_{2})$ is to calculate the extremes of $\varphi(\theta)$ for a grid of points $\theta$ in $A_{\eta}$ . Let $\mathcal{A}_{\eta}\subset A_{\eta}$ be this grid of points; one can then calculate $(\varphi_{1}^{\star},\varphi_{2}^{\star})$ as an approximation to $(\varphi_{1},\varphi_{2})$ where $\varphi_{1}^{\star}=\min_{\bm{\theta}\in\mathcal{A}_{\eta}}\{\varphi(\bm{\theta})\}$ and $\varphi_{2}^{\star}=\max_{\bm{\theta}\in\mathcal{A}_{\eta}}\{\varphi(\bm{\theta})\}$ . Appendix B illustrates how to construct a grid of points in $A_{\eta}$ .

Two cases were considered for illustration. The first case, labelled ‘Microsoft stock returns’, corresponds to a GARCH (1,1,) estimated on daily log returns of the Microsoft stock price, over the period 2010-12-08 to 2018-11-15, for a total of 2000 observations. The GARCH(1,1) ML estimates were $\hat{\omega}=0.048977(0.0049464)$ , $\hat{\alpha}=0.078824(0.0075383)$ , $\hat{\beta}=0.88389(0.0092256)$ , with estimated standard errors in parenthesis. The estimated asymptotic variance covariance was saved and used to compute the estimation uncertainty region for $Q_{2,p}$ and $ES_{2,p}$ . Table 5 reports the results.

The second case, labelled ‘One simulation run’, corresponds to the simulation of 1000 data points from a GARCH(1,1) with $\omega=0.02$ $\alpha=0.1$ and $\beta=0.8$ . The resulting ML estimates were $\hat{\omega}=0.035234(0.0128)$ , $\hat{\alpha}=0.13336(0.0896)$ , $\hat{\beta}=0.68463(0.0334)$ , with standard errors in parenthesis. The estimated asymptotic variance covariance was saved and used to compute the estimation uncertainty intervals for $Q_{2,p}$ and $ES_{2,p}$ . Table 5 reports the results.

For both cases in Table 5, 200 points were used in the grid, half of which were selected as image of points $\bm{\theta}$ for which the inequality in (6.1) is valid as an equality, i.e. points on the surface of the confidence ellipsoid. The last column in Table 5 reports how many of the extremes in each row were found corresponding to $\bm{\theta}$ values on the surface. It can be seen that many of these extremes come from points on the surface, but not all. Increasing the number of points in the grid to 2000 gave marginal improvements for the extremes.141414The extremes varied for less that $4.1\cdot 10^{-5}$ for the Microsoft case and for less than $3.5\cdot 10^{-4}$ for the One simulation run. For 2000 points, 4 out of 4 (respectively 3 out of 4) of the extremes came from points on the surface for the Microsoft case (respectively for the One simulation run). More details on the computations behind Table 5 are reported in Appendix B.

One could ask whether analogues to this procedure exist which use MC in place of the exact formulae, where each map $\bm{\varphi}(\theta)$ is replaced by MC simulation plus MC estimation of $\bm{\varphi}(\cdot)$ . The MC approach implies a large computational burden, because of the added MC simulation and estimation burden associated with the estimation of $\bm{\varphi}(\cdot)$ map. Moreover, the inherent limitations associated with MC confidence interval discusses in Section 5 would apply here, which would add extra uncertainty for the estimation of $\bm{\varphi}(\cdot)$ . This additional layer of MC uncertainty is completely avoided by the present exact methods.

In other words, the uncertainty regions produced via the present exact methods only reflect in-sample estimation uncertainty associated with the GARCH parameter, but not the MC simulation and estimation uncertainty of the $\bm{\varphi}(\cdot)$ map.

7 Conclusions

This paper presents the analytical form of the prediction density of a GARCH(1,1) process. This can be used to evaluate the probability of tail events or of quantities that may be of interest for value at risk calculations. The exact formulae improve on approximation methods based on moments, or on Monte Carlo simulation and estimation.

The exact formuale show that, while the prediction density can be very far from normal, for common parameter values often encountered in applications, the discrepancy of the prediction density from the Gaussian distribution can be small. These results could not be obtained without the explicit form of the prediction density.

The present exact results are shown to imply easy-to-compute uncertainty regions for risk functionals, so as to reflect estimation uncertainty. These tools are not available for alternatives based on approximations or MC simulations and estimation of functionals.

The techniques in this paper can be extended to the case of symmetric innovations density $g(\cdot)$ different from the N(0,1) one. Different densities imply distinct subsequent (negative) binomial expansions of expression (2.7) for $\sigma_{t}^{2}$ , and different auxiliary convergence conditions on the GARCH coefficients, similarly to Assumption 3.3. These extensions are left to future research.

Appendix A Proofs

The proofs of the Theorems are based on several Lemmas, which are reported first.

Lemma A.10 (Limits of $\Psi$ ).

$\Psi(a,c;z)\rightarrow 0$ * for $c\rightarrow-\infty$ for real and positive $a$ and $z$ and $\Psi(a,c;z)\rightarrow 0$ for $a\rightarrow\infty$ for real and positive $c$ and $z$ .*

Proof.

The proof uses the Lebesgue dominated convergence theorem, see e.g. Theorem 10.27 in Apostol (1974). Consider the integral representation (3.1) of $\Psi(a,c;z)$ for real and positive $a$ and $z$ . Note that for negative $c$ and $t\geq 0$ one has

[TABLE]

where $k_{n}(t)$ , $r(t)>0$ , $\int_{\mathbb{R}_{+}}k_{n}(t)\mathrm{d}t=\Psi(a,c;z)$ and $\int_{\mathbb{R}_{+}}r(t)\mathrm{d}t=\frac{1}{\Gamma(a)}\int_{\mathbb{R}_{+}}e^{-zt}t^{a-1}\mathrm{d}t=\frac{1}{\Gamma(a)}z^{-a}\Gamma(a)=z^{-a}$ ; this shows that $k_{n}(t)$ is dominated by the function $r(t)$ , which is Lebesgue-integrable on $\mathbb{R}_{+}$ . The notation $k_{n}(t)$ is chosen here to indicate that a sequence of values $a_{n}$ or $c_{n}$ will be constructed.

Next observe that for any $t>0$ , and for $c_{n}\rightarrow-\infty$ , one has $k_{n}(t):=e^{-zt}t^{a-1}\left(1+t\right)^{c_{n}-a-1}\rightarrow 0$ . Hence $k_{n}(t)$ converges to the zero function $k(t):=0$ on the whole $\mathbb{R}_{+}$ , except for the point $t=0$ . By the dominated convergence theorem, $\lim_{c_{n}\rightarrow-\infty}\Psi(a,c_{n};z)=\lim_{c_{n}\rightarrow-\infty}\int_{\mathbb{R}_{+}}k_{n}(t)\mathrm{d}t=\int_{\mathbb{R}_{+}}k(t)\mathrm{d}t=0$ . This proves that $\Psi(a,c;z)\rightarrow 0$ for $c\rightarrow-\infty$ for real and positive $a$ and $z$ .

Let now $a_{n}\rightarrow\infty$ and observe that for any $t>0$ , $\frac{t}{1+t}<1$ and $\Gamma(a_{n})\rightarrow\infty$ , and hence

[TABLE]

Hence $k_{n}(t)$ converges to the zero function $k(t):=0$ on the whole $\mathbb{R}_{+}$ . By the dominated convergence theorem, $\lim_{a_{n}\rightarrow-\infty}\Psi(a_{n},c;z)=\lim_{a_{n}\rightarrow-\infty}\int_{\mathbb{R}_{+}}k_{n}(t)\mathrm{d}t=\int_{\mathbb{R}_{+}}k(t)\mathrm{d}t=0$ . This proves that $\Psi(a,c;z)\rightarrow 0$ for $a\rightarrow\infty$ for real and positive $c$ and $z$ . ∎

Proof of Lemma 2.1.

Consider the transformation theorem for $z_{h}=x_{h}^{2}$ ; from standard results, see e.g. Mood et al. (1974), page 201, Example 19, one has

[TABLE]

where $1_{A}$ is the indicator function of the event $A$ . Because, by symmetry, one has $f_{x_{h}}(-\sqrt{w_{h}})=f_{x_{h}}(\sqrt{w_{h}})$ , (A.1) simplifies to $f_{z_{h}}(w_{h})=w_{h}^{-\frac{1}{2}}f_{x_{h}}(\sqrt{w_{h}})1_{(w_{h}\geq 0)},$ or, letting $u_{h}$ indicate $w_{h}^{\frac{1}{2}}$ , and solving for $f_{x_{h}}(u_{h})$ , one finds $f_{x_{h}}(u_{h})=\left|u_{h}\right|f_{z_{h}}(u_{h}^{2})$ , which is (2.2). Note that the expression with the absolute value is also valid for $u_{h}=-\sqrt{w_{h}}$ . This proves (2.2).

One has by assumption that $f_{\varepsilon}(\epsilon):=g(\epsilon^{2}):=(2\pi)^{-\frac{1}{2}}\exp(-\epsilon^{2}/2)$ . Hence, simple applications of the transformation theorem cited above imply $f_{x_{t}|x_{1},\dots,x_{t-1}}(u|u_{1},\dots u_{t-1})=(2\pi\sigma_{t}^{2})^{-\frac{1}{2}}\exp\left(-\frac{1}{2}\frac{u^{2}}{\sigma_{t}^{2}}\right)=(\sigma_{t}^{2})^{-\frac{1}{2}}g\left(\frac{u^{2}}{\sigma_{t}^{2}}\right)$ and $f_{z_{t}|x_{1},\dots,x_{t-1}}(w|u_{1},\dots u_{t-1})=(w)^{-\frac{1}{2}}f_{x_{t}|x_{1},\dots,x_{t-1}}(w^{\frac{1}{2}}|u_{1},\dots u_{t-1})1_{w\geq 0}=(w\sigma_{t}^{2})^{-\frac{1}{2}}g\left(\frac{w}{\sigma_{t}^{2}}\right)$ , from which Eq. (2.3) follows. ∎

Proof of Lemma 2.2.

Consider $f_{\bm{z},z_{h}|\bm{\varsigma}}(\bm{w},w_{h}|\bm{s})$ from (2.3), and consider the transformation of from $\bm{z}$ to $\bm{y}$ . Observe that the domain of integration remains $\mathbb{R}_{+}^{h-1}$ , that the inverse transformation is $z_{t}=\beta\sigma_{t}^{2}y_{t}/\alpha_{t}$ , with Jacobian $\gamma_{h}\prod_{t=1}^{h-1}\sigma_{t}^{2}$ , where $\gamma_{h}:=\beta^{h-1}/(\prod_{t=1}^{h-1}\alpha_{t})$ . Hence one finds

[TABLE]

from which (2.8) follows, as in (2.4). ∎

Lemma A.11 (Conditions on $\beta$ ).

Assumption 3.3 ensures that for any $j\geq 2$

[TABLE]

which implies that in (2.6) one has

[TABLE]

Proof.

For $j=2$ the inequality (A.2) reads $\beta^{2}\sigma_{1}^{2}+\omega\beta-\omega\geq 0$ . Solving the quadratic on the l.h.s. for $\beta$ one finds two roots, $\beta_{1}=(-\omega-\sqrt{\omega^{2}+4\omega\sigma_{1}^{2}})/(2\sigma_{1}^{2})<0$ and $\underline{\beta}=(-\omega+\sqrt{\omega^{2}+4\omega\sigma_{1}^{2}})/(2\sigma_{1}^{2})>0$ , so that the quadratic is non-negative for $\beta\leq\beta_{1}$ or for $\beta>\underline{\beta}$ . Because $\beta_{1}<0$ is not possible, this holds only when $\beta\geq\underline{\beta}$ . This proves that (A.2) is valid for $j=2$ for $\beta\geq\underline{\beta}$ and a fortiori also for $\beta\geq\max\{\frac{1}{2},\underline{\beta}\}$ .

An induction approach is used for $j>2$ . Assume that (A.2) is valid for some $j=j_{0}\geq 2$ and $\beta\geq\max\{\frac{1}{2},\underline{\beta}\}$ ; it can then be shown that (A.2) is valid also replacing $j$ with $j+1$ . To see this, take (A.2) for $j=j_{0}$ and multiply by $\beta$ . One finds

[TABLE]

Because $\beta\geq\frac{1}{2}$ , one has $\omega(1-\beta)\leq\omega\beta$ , so that,

[TABLE]

Rearranging $1-\beta-\sum_{i=1}^{j_{0}-1}\beta^{i+1}$ as $1-\sum_{i=1}^{j_{0}}\beta^{i}$ , one finds that (A.2) holds also for $j=j_{0}+1$ . The induction step hence proves that (A.2) holds for any $j$ if $\beta\geq\max\{\frac{1}{2},\underline{\beta}\}$ .

To show (A.3), observe that the minimum value for $\beta\sigma_{h-1}^{2}$ corresponds to $v_{h-2}=\dots=v_{1}=0$ , which equals $\omega\sum_{i=1}^{j}\beta^{i}+\beta^{h-1}\sigma_{1}^{2}$ . The last expression is greater than $\omega$ by (A.2), and hence $\omega\leq\beta\sigma_{h-1}^{2}\leq\left(1+v_{h-1}\right)\beta\sigma_{h-1}^{2}$ . ∎

Lemma A.12 (Binomial expansion).

Under assumption 3.3, the following expansion holds for any $r$

[TABLE]

*where $K_{t}:=\sum_{i=1}^{t}k_{i}$ , $U_{j}:=\sum_{i=1}^{j}K_{i}=\sum_{i=1}^{j}(j-i+1)k_{i}$ , and the sums $\sum_{k_{1}=0}^{r}\sum_{k_{2}=0}^{r-k_{1}}\cdots\sum_{k_{h-2}=0}^{r-K_{h-3}}$ extend to $\infty$ if $r\notin\mathbb{N}$ . *

Proof.

Under Assumption 3.3, Lemma A.11 implies that one can employ binomial expansions of $(\sigma_{h}^{2})^{r}$ where $\sigma_{h}^{2}=\omega+\left(1+v_{h-1}\right)\beta\sigma_{h-1}^{2}$ using increasing powers of $\omega$ and decreasing powers of $\left(1+v_{h-1}\right)\beta\sigma_{h-1}^{2}$ . Hence, setting $U_{j}:=\sum_{i=1}^{j}(r-K_{i})$ ,

[TABLE]

This proves the claim. ∎

Lemma A.13 (Integrals).

One has

[TABLE]

Proof.

Set $b:=c+1$ and $t:=mv\$ with $m:=c/(c+1)$ so that $1+c+cv\ =b(1+t)$ . Note that $m^{-1}\mathrm{d}t=\mathrm{d}v$ , so that

[TABLE]

see (3.1). The case of (A.6) is obtained as the set of the last 2 equalities setting $m=b=1$ . ∎

Lemma A.14 (Coefficients $A_{h}(\cdot)$ ).

Let

[TABLE]

then

[TABLE]

and assuming that $\eqref{eq_inductive0}$ holds for $2\leq j\leq h$ , one has for $h\geq 3$

[TABLE]

where $K_{0}:=0$ , $K_{i}:=\sum_{t=1}^{i}k_{i}$ , $U_{j}:=\sum_{i=1}^{j}K_{i}=\sum_{i=1}^{j}(j-i+1)k_{i}$ , $\sum_{k_{1},\dots,k_{h-2}}:=\sum_{k_{1}=0}^{r}\sum_{k_{2}=0}^{r-k_{1}}\cdots\sum_{k_{h-2}=0}^{r-K_{h-3}}$ , and the sums extend to $\infty$ if $r\notin\mathbb{N}$ . Note that (A.9) reduces to (A.8) for $h=2$ , because the sum $\sum_{k_{1},\dots,k_{h-2}}$ and the product $\prod_{t=1}^{h-2}$ are empty and $K_{h-2}=K_{0}=0$ .

Proof.

Set $h=2$ in (A.7) and note that

[TABLE]

so that by (A.5) eq. (A.8) holds. Next consider the case $h\geq 3$ . Under $\eqref{eq_inductive0}$ one can use expansion (A.4) in (A.7). Integrating one finds

[TABLE]

Using (A.6) and (A.8), one finds (A.9). ∎

Proof of Theorem 3.4.

The integral to be solved is

[TABLE]

Expand $\exp(-w_{h}/(2\sigma_{h}^{2}))=\sum_{j=0}^{\infty}\frac{\left(-w_{h}/2\right)^{j}}{j!}\left(\sigma_{h}^{2}\right)^{-j}$ and note that

[TABLE]

where $A_{h}(\cdot)$ is defined in (A.7). By (A.9) one has $f_{z_{h}|\bm{\varsigma}}(w_{h}|\bm{s})=w_{h}^{-\frac{1}{2}}\left(2\pi\right)^{-\frac{h}{2}}\sum_{j=0}^{\infty}\frac{1}{j!}(-\rho_{w})^{j}c_{j,\bm{s}}$ where $\rho_{w}:=w_{h}/(\omega+\beta\sigma_{1}^{2})$ and $c_{j,\bm{s}}=\gamma_{h}^{\frac{1}{2}}A_{h}(-\frac{1}{2}-j)$ . Marginalizing with respect to $\bm{\varsigma}$ , being all elements in $\mathcal{S}$ equally likely, one finds from (2.5)

[TABLE]

where $c_{j}:=2^{-h+1}\sum_{\bm{s}\in\mathcal{S}}c_{j,\bm{s}}$ . Note that if all $c_{j,\bm{s}}$ do not vary with $\bm{s}$ , one has $c_{j}=c_{j,\bm{s}}$ .

In order to show that $f_{z_{h}}(w_{h})$ and $f_{x_{h}}(u_{h})$ are absolutely summable for finite $w_{h}$ , $u_{h}$ consider for instance the case $h=2$ for $f_{z_{2}}(w)$ . One has

[TABLE]

where $\rho_{w}:=\frac{w}{2\left(\omega+\beta\sigma_{1}^{2}\right)}$ , $\xi=\frac{\omega+\beta\sigma_{1}^{2}}{2\alpha_{1}\sigma_{1}^{2}}>0$ . Because $\Psi\left(\frac{1}{2};1-j;\xi\right)$ is non-negative and tends to 0 by Lemma A.10 for increasing $j$ , one has $\sup_{-j\in\mathbb{N}}\Psi\left(\frac{1}{2};1-j;\xi\right)=M<\infty$ , so that

[TABLE]

where $\exp\left(\rho\right)$ is finite for any finite $\rho_{w}$ . One hence concludes that the series is absolutely convergent for any finite evaluation point $w$ . The same argument applies to $f_{x_{h}}(u_{h})$ for the case $h=2$ . The case for $h>2$ is similar. ∎

Proof of Corollary 3.5.

The c.d.f.s are found by integrating termwise the p.d.f from 0 to $w_{h}$ or $u_{h}$ for positive $u_{h}$ . Termwise integration is guaranteed by Theorem 10.26 in Apostol (1974). This delivers the expressions in (3.5) for $w_{h}$ and $u_{h}$ for positive $u_{h}$ . The symmetry of $f_{x_{h}}(\cdot)$ implies $F_{x_{h}}(0)=\frac{1}{2}$ and $F_{x_{h}}(u_{h})=1-F_{x_{h}}(-u_{h})$ . Hence for $0>u_{h}=-a$ , say, with $a>0$ , one has

[TABLE]

which proves that the expressions in (3.5) in valid also for negative $u_{h}$ .

The moments are derived as follows. From (A.10) one sees that

[TABLE]

Recall that $\int_{\mathbb{R}_{+}}\exp\left(-\frac{w}{2\sigma_{h}^{2}}\right)w^{m-\frac{1}{2}}\mathrm{d}w=\left(2\sigma_{h}^{2}\right)^{m+\frac{1}{2}}\Gamma\left(m+\frac{1}{2}\right)$ so that

[TABLE]

where $A_{h}(\cdot)$ is defined in (A.7), which equals (A.9) (or (3.2)). Hence

[TABLE]

∎

Proof of Theorem 3.7.

Consider first the case $h=2$ and

[TABLE]

As in the proof of Lemma A.13, observe that $\sigma_{2}^{2}$ can be written as $b(1+s)$ with $b:=\omega+\beta\sigma_{1}^{2}$ and $s:=v_{1}/q$ where $q=(\omega+\beta\sigma_{1}^{2})/\beta\sigma_{1}^{2}$ , $\sigma_{2}^{2}=\omega+(1+v_{1})\beta\sigma_{1}^{2}=\omega+\beta\sigma_{1}^{2}+\beta\sigma_{1}^{2}v_{1}=b(1+s)$ . Next define $\rho:=w_{2}/(2(\omega+\beta\sigma_{1}^{2}))$ , and note that $w_{2}/(2\sigma_{2}^{2})=\rho/(1+s)$ ; observe also that $\exp\left(-w_{2}/(2\sigma_{2}^{2})\right)=\exp\left(-\rho/(1+s)\right)=\exp\left(-\rho\right)\exp\left(\rho s/(1+s)\right)$ , where the last term can be expanded as $\exp\left(\rho s/(1+s)\right)=\sum_{j=0}^{\infty}\frac{\rho^{j}}{j!}s^{j}(1+s)^{-j}$ .

Substituting these expression in (A.11), using $\gamma_{2}=\beta/\alpha_{1}$ , $q\mathrm{d}s=\mathrm{d}v_{1}$ , and setting $z:=$ $\left(\omega+\beta\sigma_{1}^{2}\right)/\left(2\alpha_{1}\sigma_{1}^{2}\right)=\beta q/(2\alpha_{1})$ , one finds

[TABLE]

By eq. (2) in Abadir (1999) $\Gamma\left(j+\frac{1}{2}\right)=\sqrt{\pi}\left(\frac{1}{2}\right)_{j}$ , where $(a)_{j}:=\prod_{i=1}^{j-1}(a+i)$ denotes Pochhammer’s symbol. Substituting back, noting that $\rho=\rho_{w}$ and rearranging, one finds (3.7), (3.8) and (3.9).

Next consider the case $h\geq 3$ and

[TABLE]

Note that $\sigma_{h}^{2}=\omega(1+s)$ with $s:=(1+v_{h-1})\beta\sigma_{h-1}^{2}/\omega$ ; next set $\rho:=w_{h}/(2\omega)$ so that $\exp\left(-w_{h}/(2\sigma_{h}^{2})\right)=\exp\left(-\rho/(1+s)\right)=\exp(-\rho)\exp\left(\rho s/(1+s)\right)$ .

Setting $p_{j}(\rho):=(-1)^{j}L_{j}^{(-1)}(\rho)$ , where $L_{j}^{(-1)}(\rho)=\sum_{k=0}^{j}\left(k\right)_{j-k}\left(-j\right)_{k}\frac{\rho^{k}}{k!}$ is a generalized Laguerre polynomial, see Abramowitz and Stegun (1964) formulae 22.2.12, 22.9.16, and one finds

[TABLE]

Next consider $(\sigma_{h}^{2})^{-\frac{1}{2}}=\omega^{-\frac{1}{2}}(1+s)^{-\frac{1}{2}}$ and expand $(1+s)^{-\frac{1}{2}}$ in decreasing powers of $s$ , which is convergent thanks to Assumption 3.3; this implies that

[TABLE]

Substituting back in $f_{z_{h}|\bm{\varsigma}}(w_{h}|\bm{s})$ one finds

[TABLE]

Next use the binomial expansion on $\left(\sigma_{h-1}^{2}\right)^{q}$ with $q=j-\frac{1}{2}-k_{1}$ , setting $K_{j}^{\ast}:=\sum_{i=2}^{j}k_{i}$ , $U_{j}^{\ast}:=\sum_{i=2}^{j}K_{i}^{\ast}$

[TABLE]

Note that $q-K_{t}^{\ast}=j-\frac{1}{2}-K_{t}$ in the previous expression. The integral in the last two lines of (A.12) can be computed using Lemma A.13, and equals

[TABLE]

Substituting back and rearranging, one finds (3.10).

Summability of (3.7), (3.8), (3.9), (3.10) for any finite $w_{h}$ or $u_{h}$ is proved as in the proof of Theorem 3.4 using Lemma A.10.

∎

Appendix B Mapping estimation uncertainty

This Appendix describes how to construct a grid of points in $B_{\eta}$ . The approach is to select points $\bm{y}$ uniformly in $\mathcal{S}$ , the closed unit ball in $r$ dimensions, map them into points $\bm{\theta}$ in $A_{\eta}$ , and finally apply the exact formulae $\bm{\varphi}$ to obtain points in $B_{\eta}=\bm{\varphi}(A_{\eta})=\{\bm{\varphi}(\bm{\theta}),\bm{\theta}\in A_{\eta}\}$ in (6.2). This creates a grid of points in $B_{\eta}$ , over which one can obtain extremes of the uncertainty region.

Let $\bm{v}=\bm{B}\bm{R}^{\prime}\bm{\theta}$ , $\bm{u}=\bm{B}\bm{R}^{\prime}\widehat{\bm{\theta}}$ , $\bm{B}=(c_{\eta}\bm{\varOmega}_{\bm{R}})^{-\frac{1}{2}}$ where $\bm{A}^{\frac{1}{2}}$ indicates the symmetric square root of a positive semidefinite matrix $\bm{A}$ , i.e. $\bm{A}^{\frac{1}{2}}=\bm{U}\bm{\varLambda}^{\frac{1}{2}}\bm{U}^{\prime}$ , where $\bm{A}=\bm{U}\bm{\varLambda}\bm{U}^{\prime}$ is the spectral decomposition of $\bm{A}$ . The vectors $\bm{v}$ and $\bm{u}$ are $r\times 1$ vectors, and let $\bm{x}=\bm{v}-\bm{u}$ . The set $A_{\eta}$ in (6.1) corresponds to $C_{\eta}=\{\bm{v}\in\mathbb{R}^{r}:\|\bm{v}-\bm{u}\|\leq 1\}$ , where $\|\bm{a}\|=(\bm{a}^{\prime}\bm{a})^{\frac{1}{2}}$ is the Euclidean norm. Any point $\bm{v}$ in $C_{\eta}$ corresponds to a unique point $\bm{x}$ in the closed unit ball $\mathcal{S}=\{\bm{x}\in\mathbb{R}^{r}:\|\bm{x}\|\leq 1\}$ and vice versa. Inverting this map, any point $\bm{x}$ in $\mathcal{S}$ corresponds to one $\bm{R}^{\prime}\bm{\theta}=\bm{B}^{-1}(\bm{x}+\bm{u})$ in $A_{\eta}$ .

In order to sample points uniformly in $\mathcal{S}$ , a simple algorithm is to draw $\bm{y}$ from a $\mathrm{N}(0,\bm{I}_{r})$ and $u$ from a $\mathcal{U}_{[0,1]}$ , the uniform distribution on $[0,1]$ , with $u$ independent of $\bm{y}$ ; then $\bm{x}=u^{\frac{1}{r}}\bm{y}/\|\bm{y}\|$ is uniformly distributed in $\mathcal{S}$ , see e.g. Harman and Lacko (2010). Finally one can set

[TABLE]

to find the corresponding point in $A_{\eta}$ . Finally apply the exact formulae $\bm{\varphi}$ to obtain points in $B_{\eta}=\bm{\varphi}(A_{\eta})=\{\bm{\varphi}(\bm{\theta}),\bm{\theta}\in A_{\eta}\}$ in (6.2).

In the implementation of the calculations behind Table 5, 100 draws of $\bm{y}$ and $u$ were generated independently. 100 values of $\bm{x}$ were generated as $\bm{x}=u^{\frac{1}{r}}\bm{y}/\|\bm{y}\|$ , obtaining points uniformly distributed within the sphere $\mathcal{S}$ . The same 100 draws of $\bm{y}$ were used to generate the corresponding points on the surface of $\mathcal{S}$ by replacing the value of $u$ with 1 in formula (B.1). This gave a set of 200 points in $\mathcal{S}$ , half of which on the surface.

Because of the asymptotic nature of the confidence ellipsoid, some of the obtained points $\bm{R}^{\prime}\bm{\theta}=\bm{B}^{-1}(\bm{x}+\bm{u})$ contained negative values of $\omega$ or $\alpha$ , i.e. were not inside the parameter space. This never happened for the case of Microsoft stock returns, but happened some 20% of the time in the case of the simulation run; these points $\bm{R}^{\prime}\bm{\theta}$ were discarded.

The value of $\sigma_{1}^{2}$ was chosen as 3 times $\omega(1+\alpha+\beta)$ , because the choice $\omega/(1-\alpha-\beta)$ sometimes gave negative values in the ‘One simulation run’ case, while for the ‘Microsoft stock return’ case $\sigma_{1}^{2}$ was chosen as $\omega/(1-\alpha-\beta)$ , because this gave always positive values.

Bibliography37

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Abadir (1999) Abadir, K. M. (1999). An introduction to hypergeometric functions for economists. Econometric Reviews 18 (3), 287–330.
2Abramowitz and Stegun (1964) Abramowitz, M. and I. Stegun (1964). Handbook of mathematical functions, Tenth Printing, December 1972, with corrections . Washington, D.C.: National Bureau of Standards, Applied Mathematics.
3Alexander et al. (2013) Alexander, C., E. Lazar, and S. Stanescu (2013). Forecasting Va R using analytic higher moments for GARCH processes. International Review of Financial Analysis 30, 36–45.
4Andersen et al. (2006) Andersen, T., T. Bollerslev, P. F. Christoffersen, and F. X. Diebold (2006). Volatility and correlation forecasting. In G. Elliott, C. W. Granger, and A. Timmermann (Eds.), Handbook of Economic Forecasting, Volume 1 . New York: Elsevier.
5Apostol (1974) Apostol, T. M. (1974). Mathematical Analysis (2 ed.). Addison-Wesley.
6Arvanitis et al. (2019) Arvanitis, S., M. Hallam, T. Post, and N. Topaloglou (2019). Stochastic spanning. Journal of Business & Economic Statistics 37 (4), 573–585.
7Arvanitis and Louka (2017) Arvanitis, S. and A. Louka (2017). Martingale transforms with mixed stable limits and the QMLE for conditionally heteroskedastic models. Technical report.
8Baillie and Bollerslev (1992) Baillie, R. T. and T. Bollerslev (1992). Prediction in dynamic models with time-dependent conditional variances. Journal of Econometrics 52 (1-2), 91–113.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

GARCH density and functional forecasts111Information and views set out in this paper are those of the authors and do not necessarily reflect the ones of the institutions of affiliation.

Abstract

keywords:

JEL:

Contents

1 Introduction

2 The prediction density

Lemma 2.1** (Densities).**

Lemma 2.2** (Volatility and transformations).**

3 Main results

Assumption 3.3**.**

Theorem 3.4** (GARCH(1,1) prediction density).**

Proof.

Corollary 3.5** (C.d.f. and moments).**

Example 3.6* (Numerical accuracy).*

Theorem 3.7** (Alternative formulae for the GARCH(1,1) prediction density).**

Example 3.8* (Numerical accuracy - continued).*

4 Stationary distribution

5 Comparing exact formulae with simulation-based methods

5.1 Exact calculations

5.2 Alternatives based on Monte Carlo

6 Uncertainty regions for prediction functionals

Proposition 6.9** (Uncertainty region).**

Proof.

7 Conclusions

Appendix A Proofs

Lemma A.10** (Limits of Ψ\PsiΨ).**

Proof.

Proof of Lemma 2.1.

Proof of Lemma 2.2.

Lemma A.11** (Conditions on β\betaβ).**

Proof.

Lemma A.12** (Binomial expansion).**

Proof.

Lemma A.13** (Integrals).**

Proof.

Lemma A.14** (Coefficients Ah(⋅)A_{h}(\cdot)Ah​(⋅)).**

Proof.

Proof of Theorem 3.4.

Proof of Corollary 3.5.

Proof of Theorem 3.7.

Appendix B Mapping estimation uncertainty

Lemma 2.1 (Densities).

Lemma 2.2 (Volatility and transformations).

Assumption 3.3.

Theorem 3.4 (GARCH(1,1) prediction density).

Corollary 3.5 (C.d.f. and moments).

*Example 3.6** (Numerical accuracy).*

Theorem 3.7 (Alternative formulae for the GARCH(1,1) prediction density).

*Example 3.8** (Numerical accuracy - continued).*

Proposition 6.9 (Uncertainty region).

Lemma A.10 (Limits of $\Psi$ ).

Lemma A.11 (Conditions on $\beta$ ).

Lemma A.12 (Binomial expansion).

Lemma A.13 (Integrals).

Lemma A.14 (Coefficients $A_{h}(\cdot)$ ).